Algorithms complexity
Parallel computingParallel computingYair Toaff 027481498Yair Toaff 027481498
Gil Ben Artzi 025010679Gil Ben Artzi 025010679
Orly Margalit 037616638Orly Margalit 037616638
Parallel computing - MST
The problem:
Given a graph G= (V , E) with weights.
We need to find a minimal spanning tree
with the minimum total weight.
Parallel computing - MST
Kruskal algorithm
• Sort the graphs edges by weight.
• In each step add the edge with the minimal weight that doesn’t close a cycle.
Parallel computing - MST
Complexity
Single processor:
Sorting – O(m log m) = O( n2 log n)
For each step O(1) there are O(n2) steps
Total – O(n2 log n )
Parallel computing - MST
O(m) processors:
Sorting O( log 2 m )
Each step O(1)
Total O( n2 )
Parallel computing - MST
Prim algorithm
• Randomly choose a vertex for tree initialization.
• In every step choose the edge with minimal weight form a vertex in the tree to a vertex not in the tree.
Parallel computing - MST
Complexity
Single processor:
Find the edge in step i O( n * i)
Total n + 2n + … + n2 = O(n3)
Parallel computing - MST
O(n) processors:
There is a processor for each vertex so
every step takes O(n)
Total O(n2)
Parallel computing - MST
O(m) processors
In each step there are more processors then edges so
finding the minimum takes O( log n)
Total O ( n log n)
Parallel computing - MST
O(m2) processors
In each step finding the minimum takes O( 1)
Total O ( n)
Parallel computing - MST
Sulin algorithm
• Treat every vertex as a tree
• In each step randomly choose a tree and
find the edge with the minimal weight from
a vertex in the tree to a vertex not in the tree
Parallel computing - MST
Complexity:
Single processor
Same as Kruskal algorithm
Parallel computing - MST
O(n) processors:
There is a processor for every vertex so finding the
minimum takes O( n )
In each step only half of the trees remain so there are
O ( log n ) steps
Total O( n log n)
Parallel computing - MST
O( n2 ) processors:
There are n processors for every vertex
so finding the minimum takes O(log n)
Total O(log 2 n )
Parallel computing - MST
O( n3 ) processors:
There are n2 processors for every vertex
so finding the minimum takes O(1)
Total O(log n )
Merge Sort
MS( p,q,c) - p,q indexes c is the arrayIf ( p < q )
{MS( p , (p+q)/2 , c )
MS( (p+q)/2 , q , c )
merge( p , (p+q)/2 , q , c)
}
Merge Sort
Single processor
In every step the merge takes O(n), there are
O(log n) steps.
Total O( n log n )
Merge Sort
O(n) processors:
In every step the merge is done in parallel
time( MS(n)) = O(1) + time(merge( n / 2))
By using regular merge we get
O( 1 + 2 + 4 + … + n ) = (2log n + 1) = O(n)
Merge Sort
Parallel merge
The problem: given 2 sorted arrays A,B
with size n/2 we need to merge them
efficiently while keeping them sorted
Merge Sort
Let us define 2 sub arrays:
ODD A = [a1 , a3 , a5 …]
EVEN A = [a0 , a2 , a4 …]
Merge Sort
And 2 functions:
Combine( A , B ) = [ a0 , b0 , a1 , b1 , … ]
Sort-combined( A ) – for each pair a2i a(2i+1) if
they are in the right order do nothing else
replace each of them with the other
Merge Sort
Parallel merge ( A , B )
{C = parallel merge ( ODD A , EVEN B )
D = parallel merge ( ODD B , EVEN A )
L = combine ( C , D )
Return (sort-combined ( L ) )
}
Merge Sort
Complexity:
Time ( parallel merge ( n ) ) =
Time ( parallel merge ( n/2) ) + O(1)
= O(log n)
Merge Sort
What is left is to prove the algorithm.
Theorem: if an algorithm sort every array of
(0 , 1) it will sort every array.
Merge Sort
Let us mark the number of ‘1’ in A as 1a
and in B as 1b
The number of ‘1’ in ODD A is 1a /2
The number of ‘1’ in EVEN A is 1a /2
Merge Sort
As a result of it the difference between the
number of ‘1’ in C and in D is 0 or 1.
Array L will be sorted except maybe one
point where the ‘0’ and ‘1’ meet
sort-combined will do 1 swap at most.
Merge Sort
Complexity of merge sort using parallel merge:
Log 1 + log 2 + log 4 + log 8 + … + log n =
0 + 1+ 2 + 3 + … + log n = O( log 2 n)
Sum
• Input : Array of n elements of type integer.
• Output : Sum of elements.
• One processor - O(n) operations.
• Two processors - Still O(n) operations.
Sum• What could we do if we have O(n) processors ?• Parallel algorithm
– For each phase till we have only one element• Each processor adds two elements together• We have now N/2 new elements
• Complexity– We have done more operations , so what have we
gained ?– Since in each phase we stay with only half of the
elements, we can view it as a binary tree where each level represents the new current elements, overall depth is O(logn) levels. Each level in the tree is O(1), total of O(logn) time.
Max1 – Max2
• Input : Array of n elements of type integer.• Output : The first and the second maximum
elements in the array• One processor , 2n operations.• Two processors , each insertion takes 3
operation (compare to each of the other elements that are candidates ) , 2n/3 operations
Max1 – Max2
• Parallel algorithm - recursive solution– Divide 2 groups (G1,G2).– Find MAX for each group (LocalM1,LocalM2)– If LocalM1>LocalM2
• Create new group G3 := (LocalM2+G1)
• MAX2 must be in G3, since in G2 there is no element that is bigger than LocalM2
Max1 – Max2
• Example– End of recursiveM1[10] * M1[7] * M1[1] * M1[3] * M1[100] * M1[8] * M1[55] * M1[6]
– Up one phase
M1[10],M2[7] * M1[3],M2[1] * M1[100],M2[8] * M1[55],M2[6]
– Up one phaseM1[10],M2[7,3] * M1[100],M2[8,55]
– The resultM1[100] * M2 [10,8,55]
Max1 – Max2
• Complexity– 1 processor
• n operations of comparing all elements in tree for Max1 , logn operation comparing elements for Max2, Total (n+logn)
– O(n) processors• We could find Max1and rerun the algorithm to find Max2,
each in logn, total of 2logn.
• However , we can use the previous algorithm and add G3 in parallel , and we get logn for finding Max1, loglogn for finding Max2
Max & Min groups
• Input : 2 groups ( G1,G2) of sorted elements• Output : 2 groups (G1`,G2`), where in one
group all elements are bigger than all the elements in the other group
• One processor - Insert all elements into 2 stack, always compare the stack heads, the minimum is inserted into the Min group.
• Complexity - O(n) operations
Max & Min groups
• There is a major subtle in the previous algorithm when trying to apply it to parallel computing – each element must be compared until we will find an element that is higher himself.
• We would like to find a method to compare as less as we can each elements with the others , the best is only one comparison per element.
• Any member of the min group is necessarily smaller than at least half of the elements.
• If we could conclude this, we can classified the element in the right group immediately
• Any suggestion ?
Max & Min groups• Parallel algorithm
– Insert all elements from G1 into list L1 in a reverse order , and all elements of G2 into list L2 in regular order
– Element j in L1 is bigger than n-j-1 elements of his list– Element j in L2 is bigger than j-1 elements of his list– So , by comparing element i in both lists we get
• If L1[i]>L2[i] , L1[i] is bigger than n-i-1 elements in L1 , and i+1(including L2[i]) elements in L2 , total of n elements. L2[i] is smaller than n-i elements of L2 and i+1 elements element of L1 , total of n elements.
• And vice versa
– We can now insert the element immediately to their groups
Max & Min groups
• Example– Groups
• G1 = 7,10,100,101• G2 = 1,11,18,99
– Lists• L1 = 101,100,10,7 • L2 = 1, 11,18, 99
– Comparing : (101,1),(100,11),(10,18),(7,99)– Result : G1’= 101,100,18,99 ,G2’ = 1,11,10,7
Max & Min groups
• Complexity– We have compare element i of each lists– Each element has only one comparison – O(n) processor , O(1) time !– Can we do better for one processor now ?
Signed elements• Input : Array of elements , some of them are signed• Output : 2 Arrays of elements , one contain the signed , the
other the unsigned, keeping the order between the elements• One processor
– Make one pass , drop each element into the correct array– O(n) operations
• Since we need to maintain the order between the elements , we must know for each element , how many elements should be before him
• how could we improve the Algorithm by adding more processors ?
Signed elements array
• Parallel algorithm– Create another array (A2) of elements, where in
each location of a signed element insert 1 and in each location of unsigned elements insert 0
– Now we can do the parallel prefix algorithm and obtaining each element position in the destination array
– We can do the same for the unsigned elements
Signed elements array
• Example– Input : [x1,x2,x3`,x4,x5`,x6,x7`,x8`,x9]– A2 : [0 , 0 , 1 , 0 , 1 ,0 ,1 , 1 ,0 ]– Prefix: [0 , 0 , 1 , 1 , 2 , 2 ,3 , 4 , 4 ]– Result: x3’1 , x5`2 , x7`3 , x8`4
• Complexity– O(n) processor , O(logn) time !
Scheduling
• Input : Array of jobs , contains the time for executing each job , and the deadline for finishing it.
• Output : Is there a scheduling satisfying the above condition ?
• Parallel algorithm– Sort the deadlines– Create prefix for executing time of each job– In order to exist a scheduling , PrefixExecTime(i)<DeadLine[i]
• Complexity O(n) processors– O(lognlogn) to sort, O(logn) to do prefix , O(1) to compare
CAG - Clique
• Input : CAG• Output : maximum clique exist• Reminder
– Clique : A vertex is in a clique iff there is an edge from each of the vertex in the clique to himself
– CAG : Circular Arc Graph , A graph where each vertex is on a circle . There is an edge between two vertex iff there is a join segment on the circle between those two vertex
CAG – Clique
• Examples– Clique [V1,V2,V3]
– CAG
v1
v2 v3
v4
v1
v2
v3
v4
CAG - Clique
• Parallel algorithm – Loop through element list twice
• If Element == start of a vertex , BoundriesArray[i]=+1;
• If Element == end of a vertex , and we already pass the start of this vertex , BoundriesArray[i]= -1 ;
– PrefixArray := Prefix ( BoundriesArray)– MaxClique := Max ( PrefixArray)
CAG - Clique
• Example , CAG from previous slide– BoundriesArray [ (v1,+),(v2,+),(v1,-),(v4,+),(v3,-),(v4,-),(v2,+),(v1,+ ),(v3,+ )(v2,-),(v1,-)]
– PrefixArray[1,2,1,2,1,0,1,2,3,2,1]– MaxClique is 3 !
• Note : There is a need to loop twice trough the list of vertex since we consider only end of vertex that we already pass the start.
CAG – Clique
• Complexity– One processor , O(n) – O(n) processors , logn + logn– O( n^2) processors , logn + o(1)
Exclusive Read & Exclusive Write
• EREW
• Most simple computer
• Only one processor can read/write to a certain memory block at a time
Concurrent Read & Exclusive Write
• CREW
• Only one processor can write to a certain memory block at a time.
• Multiple processors can simultaneously read from a common memory block.
Exclusive Read & Concurrent Write
• ERCW
• Only one processor can read a certain memory block at a time.
• Multiple processors can simultaneously write to a common memory block.
Concurrent Read & Concurrent Write
• CRCW
• Most powerful computer
• Very complex memory control
• Multiple processors can simultaneously read/write to a common memory block
Concurrent Write
Problem:
• Multiple processors writing different values to a common memory block every processor overwrites on previous processor’s value.
MemoryBlock
Processor 1
Processor 2
Processor 3
Concurrent Write
Solution1:
• Restrict Write – a unique value can only be written to the memory block.
1
Processor 1
Processor 2
Processor 3
1
1
1
Concurrent Write
Solution2:• Combine Write – a unique value is stored
for every distinct processor in the shared memory block.
1,2,4
Processor 1
Processor 2
Processor 3
1
2
4
Restrict Write
A good example of Restrict Write is a Boolean problem.
X1 X2 X3 Result
Restrict Write
X1 X2 X3 Result Initial value: Result = 0Only value one is written to Result
result = 0;
For i = 1 to n doip (do in parallel) {
if (Xi = = 1)
then result = 1;
}
Max Value - O(n2) Processors
Reminder:
One processor : O(n) operations.
O(n) processors : O(log2n) operations.
O(n2) processors : ?
We can represent the comparison between numbers as a matrix. If x1< x2 then coordinate (1,2) gets a value of one, else it gets a value of zero.
Max Value - O(n2) Processors
• A processor is allocated for each cell in the matrix.• All the processors with “value = 1” write
simultaneously to the result cell in their row.
X1
X2
X3
Result
(1,1) (1,2) (1,3)
(2,1) (2,2) (2,3)
(3,1) (3,2) (3,3)
X1 X2 X3
Row1
Row2
Row3
Max Value - O(n2) Processors
Total operations with O(n2) processors : O(1)– Generating the Matrix : O(1) operations
(one processor per cell)– Generating the result column : O(1) operations
3
6
4
Result
0 1 1
0 0 0
0 1 0
3 6 4
1
0
1
Max Value
Sort - O(n2) Processors
Reminder:
One processor : O(nlog2n) operations.
O(n) processors : O(log22n) operations (merge sort)
O(n2) processors : ?
• As before, we generate a comparison matrix.• The result cells will receive the sum of the current row.
Each row has O(n) processors, therefore the sum operation takes O(log2n) operations.
• The result column represents the index of the sorted array in descending order.
Sort - O(n2) Processors
Total operations with O(n2) processors : O(log2n)
– Generating the Matrix : O(1) operations
(one processor per cell)– Generating the result column : O(log2n) operations
3
6
4
Result
0 1 1
0 0 0
0 1 0
3 6 4
2
0
1
Multiplication Of Matrix
• Matrixes that can be multiplied must obeyed the dimension law : RnCm * RmCk
a11
a21
a12
a22
b11
b21
b12
b22
a11b11 + a12b21
a21b11 + a22b21
a11b12 + a12b22
a21b12 + a22b22
Multiplication Of Matrix
Input: Two matrixes of size n*n (Mnn)
Output: One matrix Mnn
Total operations with one processor : O(n3)
• n2 cells • Sum of each cell with O(n) variables and one
processor, O(n) operations
Multiplication Of Matrix
Total operations with o(n) processors : O(n2)• Processor per cell in a column. • n columns • Sum of each cell with O(n) variables and one
processor, O(n) operations
O(n)sum * ncolumn = O(n2)
Multiplication Of Matrix
Total operations with O(n2) processors : O(n)
• n2 cells
• Processor per cell
• Sum of each cell with O(n) variables and one processor, O(n) operations
O(n)sum * 1cell = O(n)
Each cell is summed simultaneously
Multiplication Of Matrix
Total operations with O(n3) processors : O(log2n)
• n2 cells
• O(n) processors per cell
• Sum of each cell with O(n) variables and O(n) processor, O(log2n) operations
O(log2n)sum * 1cell = O(log2n)
Each cell is summed simultaneously
Multiplication Of Boolean Matrix
Total operations with O(n3) processors : O(1)
• n2 cells
• O(n) processors per cell
• Sum of each cell with O(n) variables and O(n) processor, O(1) operations
O(1)sum * 1cell = O(1)
Each cell is summed simultaneously
Shortest Path Between Vertexes
Problem:• Finding if path exists between 2 vertexes• Finding the shortest path between 2
vertexes
1 1
11
V2
V1
V3
V4
Shortest Path Between Vertexes• Represent the graph as a matrix Ann. • If an arc exists between vertex X1 and X2, then coordinates
(1,2) & (2,1) get a value of one, otherwise zero.• Matrix Ann - all the vertexes that are of one arc distance from
each other.
V1
V2
V3
V4
1 0 1
0 1 0
1 0 1
0
1
0
0 1 0 1
V1 V2 V3 V4
1 1
11
V2
V1
V3
V4
Shortest Path Between Vertexes
• Matrix Ann2 - all the vertexes that are of two arcs distance
from each other.
• Ann + Ann
2 = all routes of distance of one and two arcs.
V1
V2
V3
V4
2 0 2
0 2 0
2 0 2
0
2
0
0 2 0 2
V1 V2 V3 V4
1 1
11
V2
V1
V3
V4
Shortest Path Between Vertexes
• Ann + Ann
2 + Ann3 + …Ann
n = B - all routes of distance 1 to n arcs.
• Any zero values in matrix B, represents no link exists between the two vertexes.
V1
V2
V3
V4
2 1 2
1 2 1
2 1 2
1
2
1
1 2 1 2
V1 V2 V3 V4
1 1
11
V2
V1
V3
V4
Shortest Path Between Vertexes
Total operations with 1 processors : O(n4) • Building of Matrix Ann : O(n) operations
• Multiplication of matrix : O(n3) operations
• Creation of Ann,Ann
2 ,Ann3 , … ,Ann
n : O(n4) operations
• Sum of the Matrixes : O(n3) operations
Shortest Path Between Vertexes
Total operations with O(n) processors : O(n3)
• Building of Matrix Ann : O(1) operations
• Multiplication of matrix : O(n2) operations
• Creation of Ann,Ann
2 ,Ann3 , … ,Ann
n : O(n3) operations
• Sum of the Matrixes : O(n2) operations (ncell * ncolumn)
Shortest Path Between Vertexes
Total operations with O(n2) processors: O(n2) • Building of Matrix Ann : O(1) operations
• Multiplication of matrix : O(n) operations
• Creation of Ann,Ann
2 ,Ann3 , … ,Ann
n : O(n2) operations
• Sum of the Matrixes : O(n) operations (process per cell)
Shortest Path Between Vertexes
Total operations with O(n3) processors: O(nlog2n)
• Building of Matrix Ann : O(1) operations
• Multiplication of matrix : O(log2n) operations
• Creation of Ann,Ann
2 ,Ann3 , … ,Ann
n : O(nlog2n) operations
• Sum of the Matrixes : O(log2n) operations (o(n)
processors per cell)
Shortest Path Between Vertexes
Total operations with O(n4) processors : O(log22n)
• Building of Matrix Ann : O(1) operations
• Multiplication of matrix : O(log2n) operations with O(n3) processors
• Creation of Ann,Ann
2 ,Ann3 , … ,Ann
n : O(log22n) operations (prefix
algorithm)
• Sum of the Matrixes : O(log2n) operations
• Boolean Output (link exist True or False) : O(log2n) operations