Download ppt - Algorithms complexity

Algorithms complexity

Parallel computingParallel computingYair Toaff 027481498Yair Toaff 027481498

Gil Ben Artzi 025010679Gil Ben Artzi 025010679

Orly Margalit 037616638Orly Margalit 037616638

Parallel computing - MST

The problem:

Given a graph G= (V , E) with weights.

We need to find a minimal spanning tree

with the minimum total weight.


Kruskal algorithm

• Sort the graphs edges by weight.

• In each step add the edge with the minimal weight that doesn’t close a cycle.


Complexity

Single processor:

Sorting – O(m log m) = O( n2 log n)

For each step O(1) there are O(n2) steps

Total – O(n2 log n )


O(m) processors:

Sorting O( log 2 m )

Each step O(1)

Total O( n2 )


Prim algorithm

• Randomly choose a vertex for tree initialization.

• In every step choose the edge with minimal weight form a vertex in the tree to a vertex not in the tree.


Complexity

Single processor:

Find the edge in step i O( n * i)

Total n + 2n + … + n2 = O(n3)


O(n) processors:

There is a processor for each vertex so

every step takes O(n)

Total O(n2)


O(m) processors

In each step there are more processors then edges so

finding the minimum takes O( log n)

Total O ( n log n)


O(m2) processors

In each step finding the minimum takes O( 1)

Total O ( n)


Sulin algorithm

• Treat every vertex as a tree

• In each step randomly choose a tree and

find the edge with the minimal weight from

a vertex in the tree to a vertex not in the tree


Complexity:

Single processor

Same as Kruskal algorithm


O(n) processors:

There is a processor for every vertex so finding the

minimum takes O( n )

In each step only half of the trees remain so there are

O ( log n ) steps

Total O( n log n)


O( n2 ) processors:

There are n processors for every vertex

so finding the minimum takes O(log n)

Total O(log 2 n )


O( n3 ) processors:

There are n2 processors for every vertex

so finding the minimum takes O(1)

Total O(log n )

Merge Sort

MS( p,q,c) - p,q indexes c is the arrayIf ( p < q )

{MS( p , (p+q)/2 , c )

MS( (p+q)/2 , q , c )

merge( p , (p+q)/2 , q , c)

}

Merge Sort

Single processor

In every step the merge takes O(n), there are

O(log n) steps.

Total O( n log n )

Merge Sort

O(n) processors:

In every step the merge is done in parallel

time( MS(n)) = O(1) + time(merge( n / 2))

By using regular merge we get

O( 1 + 2 + 4 + … + n ) = (2log n + 1) = O(n)

Merge Sort

Parallel merge

The problem: given 2 sorted arrays A,B

with size n/2 we need to merge them

efficiently while keeping them sorted

Merge Sort

Let us define 2 sub arrays:

ODD A = [a1 , a3 , a5 …]

EVEN A = [a0 , a2 , a4 …]

Merge Sort

And 2 functions:

Combine( A , B ) = [ a0 , b0 , a1 , b1 , … ]

Sort-combined( A ) – for each pair a2i a(2i+1) if

they are in the right order do nothing else

replace each of them with the other

Merge Sort

Parallel merge ( A , B )

{C = parallel merge ( ODD A , EVEN B )

D = parallel merge ( ODD B , EVEN A )

L = combine ( C , D )

Return (sort-combined ( L ) )

}

Merge Sort

Complexity:

Time ( parallel merge ( n ) ) =

Time ( parallel merge ( n/2) ) + O(1)

= O(log n)

Merge Sort

What is left is to prove the algorithm.

Theorem: if an algorithm sort every array of

(0 , 1) it will sort every array.

Merge Sort

Let us mark the number of ‘1’ in A as 1a

and in B as 1b

The number of ‘1’ in ODD A is 1a /2

The number of ‘1’ in EVEN A is 1a /2

Merge Sort

As a result of it the difference between the

number of ‘1’ in C and in D is 0 or 1.

Array L will be sorted except maybe one

point where the ‘0’ and ‘1’ meet

sort-combined will do 1 swap at most.

Merge Sort

Complexity of merge sort using parallel merge:

Log 1 + log 2 + log 4 + log 8 + … + log n =

0 + 1+ 2 + 3 + … + log n = O( log 2 n)

Sum

• Input : Array of n elements of type integer.

• Output : Sum of elements.

• One processor - O(n) operations.

• Two processors - Still O(n) operations.

Sum• What could we do if we have O(n) processors ?• Parallel algorithm

– For each phase till we have only one element• Each processor adds two elements together• We have now N/2 new elements

• Complexity– We have done more operations , so what have we

gained ?– Since in each phase we stay with only half of the

elements, we can view it as a binary tree where each level represents the new current elements, overall depth is O(logn) levels. Each level in the tree is O(1), total of O(logn) time.

Max1 – Max2

• Input : Array of n elements of type integer.• Output : The first and the second maximum

elements in the array• One processor , 2n operations.• Two processors , each insertion takes 3

operation (compare to each of the other elements that are candidates ) , 2n/3 operations

Max1 – Max2

• Parallel algorithm - recursive solution– Divide 2 groups (G1,G2).– Find MAX for each group (LocalM1,LocalM2)– If LocalM1>LocalM2

• Create new group G3 := (LocalM2+G1)

• MAX2 must be in G3, since in G2 there is no element that is bigger than LocalM2

Max1 – Max2

• Example– End of recursiveM1[10] * M1[7] * M1[1] * M1[3] * M1[100] * M1[8] * M1[55] * M1[6]

– Up one phase

M1[10],M2[7] * M1[3],M2[1] * M1[100],M2[8] * M1[55],M2[6]

– Up one phaseM1[10],M2[7,3] * M1[100],M2[8,55]

– The resultM1[100] * M2 [10,8,55]

Max1 – Max2

• Complexity– 1 processor

• n operations of comparing all elements in tree for Max1 , logn operation comparing elements for Max2, Total (n+logn)

– O(n) processors• We could find Max1and rerun the algorithm to find Max2,

each in logn, total of 2logn.

• However , we can use the previous algorithm and add G3 in parallel , and we get logn for finding Max1, loglogn for finding Max2

Max & Min groups

• Input : 2 groups ( G1,G2) of sorted elements• Output : 2 groups (G1`,G2`), where in one

group all elements are bigger than all the elements in the other group

• One processor - Insert all elements into 2 stack, always compare the stack heads, the minimum is inserted into the Min group.

• Complexity - O(n) operations

Max & Min groups

• There is a major subtle in the previous algorithm when trying to apply it to parallel computing – each element must be compared until we will find an element that is higher himself.

• We would like to find a method to compare as less as we can each elements with the others , the best is only one comparison per element.

• Any member of the min group is necessarily smaller than at least half of the elements.

• If we could conclude this, we can classified the element in the right group immediately

• Any suggestion ?

Max & Min groups• Parallel algorithm

– Insert all elements from G1 into list L1 in a reverse order , and all elements of G2 into list L2 in regular order

– Element j in L1 is bigger than n-j-1 elements of his list– Element j in L2 is bigger than j-1 elements of his list– So , by comparing element i in both lists we get

• If L1[i]>L2[i] , L1[i] is bigger than n-i-1 elements in L1 , and i+1(including L2[i]) elements in L2 , total of n elements. L2[i] is smaller than n-i elements of L2 and i+1 elements element of L1 , total of n elements.

• And vice versa

– We can now insert the element immediately to their groups

Max & Min groups

• Example– Groups

• G1 = 7,10,100,101• G2 = 1,11,18,99

– Lists• L1 = 101,100,10,7 • L2 = 1, 11,18, 99

– Comparing : (101,1),(100,11),(10,18),(7,99)– Result : G1’= 101,100,18,99 ,G2’ = 1,11,10,7

Max & Min groups

• Complexity– We have compare element i of each lists– Each element has only one comparison – O(n) processor , O(1) time !– Can we do better for one processor now ?

Signed elements• Input : Array of elements , some of them are signed• Output : 2 Arrays of elements , one contain the signed , the

other the unsigned, keeping the order between the elements• One processor

– Make one pass , drop each element into the correct array– O(n) operations

• Since we need to maintain the order between the elements , we must know for each element , how many elements should be before him

• how could we improve the Algorithm by adding more processors ?

Signed elements array

• Parallel algorithm– Create another array (A2) of elements, where in

each location of a signed element insert 1 and in each location of unsigned elements insert 0

– Now we can do the parallel prefix algorithm and obtaining each element position in the destination array

– We can do the same for the unsigned elements

Signed elements array

• Example– Input : [x1,x2,x3`,x4,x5`,x6,x7`,x8`,x9]– A2 : [0 , 0 , 1 , 0 , 1 ,0 ,1 , 1 ,0 ]– Prefix: [0 , 0 , 1 , 1 , 2 , 2 ,3 , 4 , 4 ]– Result: x3’1 , x5`2 , x7`3 , x8`4

• Complexity– O(n) processor , O(logn) time !

Scheduling

• Input : Array of jobs , contains the time for executing each job , and the deadline for finishing it.

• Output : Is there a scheduling satisfying the above condition ?

• Parallel algorithm– Sort the deadlines– Create prefix for executing time of each job– In order to exist a scheduling , PrefixExecTime(i)<DeadLine[i]

• Complexity O(n) processors– O(lognlogn) to sort, O(logn) to do prefix , O(1) to compare

CAG - Clique

• Input : CAG• Output : maximum clique exist• Reminder

– Clique : A vertex is in a clique iff there is an edge from each of the vertex in the clique to himself

– CAG : Circular Arc Graph , A graph where each vertex is on a circle . There is an edge between two vertex iff there is a join segment on the circle between those two vertex

CAG – Clique

• Examples– Clique [V1,V2,V3]

– CAG

v1

v2 v3

v4

v1

v2

v3

v4

CAG - Clique

• Parallel algorithm – Loop through element list twice

• If Element == start of a vertex , BoundriesArray[i]=+1;

• If Element == end of a vertex , and we already pass the start of this vertex , BoundriesArray[i]= -1 ;

– PrefixArray := Prefix ( BoundriesArray)– MaxClique := Max ( PrefixArray)

CAG - Clique

• Example , CAG from previous slide– BoundriesArray [ (v1,+),(v2,+),(v1,-),(v4,+),(v3,-),(v4,-),(v2,+),(v1,+ ),(v3,+ )(v2,-),(v1,-)]

– PrefixArray[1,2,1,2,1,0,1,2,3,2,1]– MaxClique is 3 !

• Note : There is a need to loop twice trough the list of vertex since we consider only end of vertex that we already pass the start.

CAG – Clique

• Complexity– One processor , O(n) – O(n) processors , logn + logn– O( n^2) processors , logn + o(1)

Exclusive Read & Exclusive Write

• EREW

• Most simple computer

• Only one processor can read/write to a certain memory block at a time

Concurrent Read & Exclusive Write

• CREW

• Only one processor can write to a certain memory block at a time.

• Multiple processors can simultaneously read from a common memory block.

Exclusive Read & Concurrent Write

• ERCW

• Only one processor can read a certain memory block at a time.

• Multiple processors can simultaneously write to a common memory block.

Concurrent Read & Concurrent Write

• CRCW

• Most powerful computer

• Very complex memory control

• Multiple processors can simultaneously read/write to a common memory block

Concurrent Write

Problem:

• Multiple processors writing different values to a common memory block every processor overwrites on previous processor’s value.

MemoryBlock

Processor 1

Processor 2

Processor 3

Concurrent Write

Solution1:

• Restrict Write – a unique value can only be written to the memory block.

1

Processor 1

Processor 2

Processor 3

1

1

1

Concurrent Write

Solution2:• Combine Write – a unique value is stored

for every distinct processor in the shared memory block.

1,2,4

Processor 1

Processor 2

Processor 3

1

2

4

Restrict Write

A good example of Restrict Write is a Boolean problem.

X1 X2 X3 Result

Restrict Write

X1 X2 X3 Result Initial value: Result = 0Only value one is written to Result

result = 0;

For i = 1 to n doip (do in parallel) {

if (Xi = = 1)

then result = 1;

}

Max Value - O(n2) Processors

Reminder:

One processor : O(n) operations.

O(n) processors : O(log2n) operations.

O(n2) processors : ?

We can represent the comparison between numbers as a matrix. If x1< x2 then coordinate (1,2) gets a value of one, else it gets a value of zero.


• A processor is allocated for each cell in the matrix.• All the processors with “value = 1” write

simultaneously to the result cell in their row.

X1

X2

X3

Result

(1,1) (1,2) (1,3)

(2,1) (2,2) (2,3)

(3,1) (3,2) (3,3)

X1 X2 X3

Row1

Row2

Row3


Total operations with O(n2) processors : O(1)– Generating the Matrix : O(1) operations

(one processor per cell)– Generating the result column : O(1) operations

3

6

4

Result

0 1 1

0 0 0

0 1 0

3 6 4

1

0

1

Max Value

Sort - O(n2) Processors

Reminder:

One processor : O(nlog2n) operations.

O(n) processors : O(log22n) operations (merge sort)

O(n2) processors : ?

• As before, we generate a comparison matrix.• The result cells will receive the sum of the current row.

Each row has O(n) processors, therefore the sum operation takes O(log2n) operations.

• The result column represents the index of the sorted array in descending order.

Sort - O(n2) Processors

Total operations with O(n2) processors : O(log2n)

– Generating the Matrix : O(1) operations

(one processor per cell)– Generating the result column : O(log2n) operations

3

6

4

Result

0 1 1

0 0 0

0 1 0

3 6 4

2

0

1

Multiplication Of Matrix

• Matrixes that can be multiplied must obeyed the dimension law : RnCm * RmCk

a11

a21

a12

a22

b11

b21

b12

b22

a11b11 + a12b21

a21b11 + a22b21

a11b12 + a12b22

a21b12 + a22b22


Input: Two matrixes of size n*n (Mnn)

Output: One matrix Mnn

Total operations with one processor : O(n3)

• n2 cells • Sum of each cell with O(n) variables and one

processor, O(n) operations


Total operations with o(n) processors : O(n2)• Processor per cell in a column. • n columns • Sum of each cell with O(n) variables and one

processor, O(n) operations

O(n)sum * ncolumn = O(n2)


Total operations with O(n2) processors : O(n)

• n2 cells

• Processor per cell

• Sum of each cell with O(n) variables and one processor, O(n) operations

O(n)sum * 1cell = O(n)

Each cell is summed simultaneously



• n2 cells

• O(n) processors per cell

• Sum of each cell with O(n) variables and O(n) processor, O(log2n) operations

O(log2n)sum * 1cell = O(log2n)


Multiplication Of Boolean Matrix

Total operations with O(n3) processors : O(1)

• n2 cells

• O(n) processors per cell

• Sum of each cell with O(n) variables and O(n) processor, O(1) operations

O(1)sum * 1cell = O(1)


Shortest Path Between Vertexes

Problem:• Finding if path exists between 2 vertexes• Finding the shortest path between 2

vertexes

1 1

11

V2

V1

V3

V4

Shortest Path Between Vertexes• Represent the graph as a matrix Ann. • If an arc exists between vertex X1 and X2, then coordinates

(1,2) & (2,1) get a value of one, otherwise zero.• Matrix Ann - all the vertexes that are of one arc distance from

each other.

V1

V2

V3

V4

1 0 1

0 1 0

1 0 1

0

1

0

0 1 0 1

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4


• Matrix Ann2 - all the vertexes that are of two arcs distance

from each other.

• Ann + Ann

2 = all routes of distance of one and two arcs.

V1

V2

V3

V4

2 0 2

0 2 0

2 0 2

0

2

0

0 2 0 2

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4


• Ann + Ann

2 + Ann3 + …Ann

n = B - all routes of distance 1 to n arcs.

• Any zero values in matrix B, represents no link exists between the two vertexes.

V1

V2

V3

V4

2 1 2

1 2 1

2 1 2

1

2

1

1 2 1 2

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4


Total operations with 1 processors : O(n4) • Building of Matrix Ann : O(n) operations

• Multiplication of matrix : O(n3) operations

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(n4) operations

• Sum of the Matrixes : O(n3) operations


Total operations with O(n) processors : O(n3)

• Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(n2) operations


2 ,Ann3 , … ,Ann


• Sum of the Matrixes : O(n2) operations (ncell * ncolumn)


Total operations with O(n2) processors: O(n2) • Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(n) operations


2 ,Ann3 , … ,Ann


• Sum of the Matrixes : O(n) operations (process per cell)


Total operations with O(n3) processors: O(nlog2n)


• Multiplication of matrix : O(log2n) operations


2 ,Ann3 , … ,Ann

n : O(nlog2n) operations

• Sum of the Matrixes : O(log2n) operations (o(n)

processors per cell)




• Multiplication of matrix : O(log2n) operations with O(n3) processors


2 ,Ann3 , … ,Ann

n : O(log22n) operations (prefix

algorithm)

• Sum of the Matrixes : O(log2n) operations

• Boolean Output (link exist True or False) : O(log2n) operations