76
Algorithms complexity Parallel Parallel computing computing Yair Toaff Yair Toaff 027481498 027481498 Gil Ben Artzi Gil Ben Artzi 025010679 025010679 Orly Margalit Orly Margalit

Algorithms complexity

  • Upload
    andres

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Algorithms complexity. Parallel computing Yair Toaff 027481498 Gil Ben Artzi 025010679 Orly Margalit 037616638. Parallel computing - MST. The problem: Given a graph G= (V , E) with weights. - PowerPoint PPT Presentation

Citation preview

Page 1: Algorithms complexity

Algorithms complexity

Parallel computingParallel computingYair Toaff 027481498Yair Toaff 027481498

Gil Ben Artzi 025010679Gil Ben Artzi 025010679

Orly Margalit 037616638Orly Margalit 037616638

Page 2: Algorithms complexity

Parallel computing - MST

The problem:

Given a graph G= (V , E) with weights.

We need to find a minimal spanning tree

with the minimum total weight.

Page 3: Algorithms complexity

Parallel computing - MST

Kruskal algorithm

• Sort the graphs edges by weight.

• In each step add the edge with the minimal weight that doesn’t close a cycle.

Page 4: Algorithms complexity

Parallel computing - MST

Complexity

Single processor:

Sorting – O(m log m) = O( n2 log n)

For each step O(1) there are O(n2) steps

Total – O(n2 log n )

Page 5: Algorithms complexity

Parallel computing - MST

O(m) processors:

Sorting O( log 2 m )

Each step O(1)

Total O( n2 )

Page 6: Algorithms complexity

Parallel computing - MST

Prim algorithm

• Randomly choose a vertex for tree initialization.

• In every step choose the edge with minimal weight form a vertex in the tree to a vertex not in the tree.

Page 7: Algorithms complexity

Parallel computing - MST

Complexity

Single processor:

Find the edge in step i O( n * i)

Total n + 2n + … + n2 = O(n3)

Page 8: Algorithms complexity

Parallel computing - MST

O(n) processors:

There is a processor for each vertex so

every step takes O(n)

Total O(n2)

Page 9: Algorithms complexity

Parallel computing - MST

O(m) processors

In each step there are more processors then edges so

finding the minimum takes O( log n)

Total O ( n log n)

Page 10: Algorithms complexity

Parallel computing - MST

O(m2) processors

In each step finding the minimum takes O( 1)

Total O ( n)

Page 11: Algorithms complexity

Parallel computing - MST

Sulin algorithm

• Treat every vertex as a tree

• In each step randomly choose a tree and

find the edge with the minimal weight from

a vertex in the tree to a vertex not in the tree

Page 12: Algorithms complexity

Parallel computing - MST

Complexity:

Single processor

Same as Kruskal algorithm

Page 13: Algorithms complexity

Parallel computing - MST

O(n) processors:

There is a processor for every vertex so finding the

minimum takes O( n )

In each step only half of the trees remain so there are

O ( log n ) steps

Total O( n log n)

Page 14: Algorithms complexity

Parallel computing - MST

O( n2 ) processors:

There are n processors for every vertex

so finding the minimum takes O(log n)

Total O(log 2 n )

Page 15: Algorithms complexity

Parallel computing - MST

O( n3 ) processors:

There are n2 processors for every vertex

so finding the minimum takes O(1)

Total O(log n )

Page 16: Algorithms complexity

Merge Sort

MS( p,q,c) - p,q indexes c is the arrayIf ( p < q )

{MS( p , (p+q)/2 , c )

MS( (p+q)/2 , q , c )

merge( p , (p+q)/2 , q , c)

}

Page 17: Algorithms complexity

Merge Sort

Single processor

In every step the merge takes O(n), there are

O(log n) steps.

Total O( n log n )

Page 18: Algorithms complexity

Merge Sort

O(n) processors:

In every step the merge is done in parallel

time( MS(n)) = O(1) + time(merge( n / 2))

By using regular merge we get

O( 1 + 2 + 4 + … + n ) = (2log n + 1) = O(n)

Page 19: Algorithms complexity

Merge Sort

Parallel merge

The problem: given 2 sorted arrays A,B

with size n/2 we need to merge them

efficiently while keeping them sorted

Page 20: Algorithms complexity

Merge Sort

Let us define 2 sub arrays:

ODD A = [a1 , a3 , a5 …]

EVEN A = [a0 , a2 , a4 …]

Page 21: Algorithms complexity

Merge Sort

And 2 functions:

Combine( A , B ) = [ a0 , b0 , a1 , b1 , … ]

Sort-combined( A ) – for each pair a2i a(2i+1) if

they are in the right order do nothing else

replace each of them with the other

Page 22: Algorithms complexity

Merge Sort

Parallel merge ( A , B )

{C = parallel merge ( ODD A , EVEN B )

D = parallel merge ( ODD B , EVEN A )

L = combine ( C , D )

Return (sort-combined ( L ) )

}

Page 23: Algorithms complexity

Merge Sort

Complexity:

Time ( parallel merge ( n ) ) =

Time ( parallel merge ( n/2) ) + O(1)

= O(log n)

Page 24: Algorithms complexity

Merge Sort

What is left is to prove the algorithm.

Theorem: if an algorithm sort every array of

(0 , 1) it will sort every array.

Page 25: Algorithms complexity

Merge Sort

Let us mark the number of ‘1’ in A as 1a

and in B as 1b

The number of ‘1’ in ODD A is 1a /2

The number of ‘1’ in EVEN A is 1a /2

Page 26: Algorithms complexity

Merge Sort

As a result of it the difference between the

number of ‘1’ in C and in D is 0 or 1.

Array L will be sorted except maybe one

point where the ‘0’ and ‘1’ meet

sort-combined will do 1 swap at most.

Page 27: Algorithms complexity

Merge Sort

Complexity of merge sort using parallel merge:

Log 1 + log 2 + log 4 + log 8 + … + log n =

0 + 1+ 2 + 3 + … + log n = O( log 2 n)

Page 28: Algorithms complexity

Sum

• Input : Array of n elements of type integer.

• Output : Sum of elements.

• One processor - O(n) operations.

• Two processors - Still O(n) operations.

Page 29: Algorithms complexity

Sum• What could we do if we have O(n) processors ?• Parallel algorithm

– For each phase till we have only one element• Each processor adds two elements together• We have now N/2 new elements

• Complexity– We have done more operations , so what have we

gained ?– Since in each phase we stay with only half of the

elements, we can view it as a binary tree where each level represents the new current elements, overall depth is O(logn) levels. Each level in the tree is O(1), total of O(logn) time.

Page 30: Algorithms complexity

Max1 – Max2

• Input : Array of n elements of type integer.• Output : The first and the second maximum

elements in the array• One processor , 2n operations.• Two processors , each insertion takes 3

operation (compare to each of the other elements that are candidates ) , 2n/3 operations

Page 31: Algorithms complexity

Max1 – Max2

• Parallel algorithm - recursive solution– Divide 2 groups (G1,G2).– Find MAX for each group (LocalM1,LocalM2)– If LocalM1>LocalM2

• Create new group G3 := (LocalM2+G1)

• MAX2 must be in G3, since in G2 there is no element that is bigger than LocalM2

Page 32: Algorithms complexity

Max1 – Max2

• Example– End of recursiveM1[10] * M1[7] * M1[1] * M1[3] * M1[100] * M1[8] * M1[55] * M1[6]

– Up one phase

M1[10],M2[7] * M1[3],M2[1] * M1[100],M2[8] * M1[55],M2[6]

– Up one phaseM1[10],M2[7,3] * M1[100],M2[8,55]

– The resultM1[100] * M2 [10,8,55]

Page 33: Algorithms complexity

Max1 – Max2

• Complexity– 1 processor

• n operations of comparing all elements in tree for Max1 , logn operation comparing elements for Max2, Total (n+logn)

– O(n) processors• We could find Max1and rerun the algorithm to find Max2,

each in logn, total of 2logn.

• However , we can use the previous algorithm and add G3 in parallel , and we get logn for finding Max1, loglogn for finding Max2

Page 34: Algorithms complexity

Max & Min groups

• Input : 2 groups ( G1,G2) of sorted elements• Output : 2 groups (G1`,G2`), where in one

group all elements are bigger than all the elements in the other group

• One processor - Insert all elements into 2 stack, always compare the stack heads, the minimum is inserted into the Min group.

• Complexity - O(n) operations

Page 35: Algorithms complexity

Max & Min groups

• There is a major subtle in the previous algorithm when trying to apply it to parallel computing – each element must be compared until we will find an element that is higher himself.

• We would like to find a method to compare as less as we can each elements with the others , the best is only one comparison per element.

• Any member of the min group is necessarily smaller than at least half of the elements.

• If we could conclude this, we can classified the element in the right group immediately

• Any suggestion ?

Page 36: Algorithms complexity

Max & Min groups• Parallel algorithm

– Insert all elements from G1 into list L1 in a reverse order , and all elements of G2 into list L2 in regular order

– Element j in L1 is bigger than n-j-1 elements of his list– Element j in L2 is bigger than j-1 elements of his list– So , by comparing element i in both lists we get

• If L1[i]>L2[i] , L1[i] is bigger than n-i-1 elements in L1 , and i+1(including L2[i]) elements in L2 , total of n elements. L2[i] is smaller than n-i elements of L2 and i+1 elements element of L1 , total of n elements.

• And vice versa

– We can now insert the element immediately to their groups

Page 37: Algorithms complexity

Max & Min groups

• Example– Groups

• G1 = 7,10,100,101• G2 = 1,11,18,99

– Lists• L1 = 101,100,10,7 • L2 = 1, 11,18, 99

– Comparing : (101,1),(100,11),(10,18),(7,99)– Result : G1’= 101,100,18,99 ,G2’ = 1,11,10,7

Page 38: Algorithms complexity

Max & Min groups

• Complexity– We have compare element i of each lists– Each element has only one comparison – O(n) processor , O(1) time !– Can we do better for one processor now ?

Page 39: Algorithms complexity

Signed elements• Input : Array of elements , some of them are signed• Output : 2 Arrays of elements , one contain the signed , the

other the unsigned, keeping the order between the elements• One processor

– Make one pass , drop each element into the correct array– O(n) operations

• Since we need to maintain the order between the elements , we must know for each element , how many elements should be before him

• how could we improve the Algorithm by adding more processors ?

Page 40: Algorithms complexity

Signed elements array

• Parallel algorithm– Create another array (A2) of elements, where in

each location of a signed element insert 1 and in each location of unsigned elements insert 0

– Now we can do the parallel prefix algorithm and obtaining each element position in the destination array

– We can do the same for the unsigned elements

Page 41: Algorithms complexity

Signed elements array

• Example– Input : [x1,x2,x3`,x4,x5`,x6,x7`,x8`,x9]– A2 : [0 , 0 , 1 , 0 , 1 ,0 ,1 , 1 ,0 ]– Prefix: [0 , 0 , 1 , 1 , 2 , 2 ,3 , 4 , 4 ]– Result: x3’1 , x5`2 , x7`3 , x8`4

• Complexity– O(n) processor , O(logn) time !

Page 42: Algorithms complexity

Scheduling

• Input : Array of jobs , contains the time for executing each job , and the deadline for finishing it.

• Output : Is there a scheduling satisfying the above condition ?

• Parallel algorithm– Sort the deadlines– Create prefix for executing time of each job– In order to exist a scheduling , PrefixExecTime(i)<DeadLine[i]

• Complexity O(n) processors– O(lognlogn) to sort, O(logn) to do prefix , O(1) to compare

Page 43: Algorithms complexity

CAG - Clique

• Input : CAG• Output : maximum clique exist• Reminder

– Clique : A vertex is in a clique iff there is an edge from each of the vertex in the clique to himself

– CAG : Circular Arc Graph , A graph where each vertex is on a circle . There is an edge between two vertex iff there is a join segment on the circle between those two vertex

Page 44: Algorithms complexity

CAG – Clique

• Examples– Clique [V1,V2,V3]

– CAG

v1

v2 v3

v4

v1

v2

v3

v4

Page 45: Algorithms complexity

CAG - Clique

• Parallel algorithm – Loop through element list twice

• If Element == start of a vertex , BoundriesArray[i]=+1;

• If Element == end of a vertex , and we already pass the start of this vertex , BoundriesArray[i]= -1 ;

– PrefixArray := Prefix ( BoundriesArray)– MaxClique := Max ( PrefixArray)

Page 46: Algorithms complexity

CAG - Clique

• Example , CAG from previous slide– BoundriesArray [ (v1,+),(v2,+),(v1,-),(v4,+),(v3,-),(v4,-),(v2,+),(v1,+ ),(v3,+ )(v2,-),(v1,-)]

– PrefixArray[1,2,1,2,1,0,1,2,3,2,1]– MaxClique is 3 !

• Note : There is a need to loop twice trough the list of vertex since we consider only end of vertex that we already pass the start.

Page 47: Algorithms complexity

CAG – Clique

• Complexity– One processor , O(n) – O(n) processors , logn + logn– O( n^2) processors , logn + o(1)

Page 48: Algorithms complexity

Exclusive Read & Exclusive Write

• EREW

• Most simple computer

• Only one processor can read/write to a certain memory block at a time

Page 49: Algorithms complexity

Concurrent Read & Exclusive Write

• CREW

• Only one processor can write to a certain memory block at a time.

• Multiple processors can simultaneously read from a common memory block.

Page 50: Algorithms complexity

Exclusive Read & Concurrent Write

• ERCW

• Only one processor can read a certain memory block at a time.

• Multiple processors can simultaneously write to a common memory block.

Page 51: Algorithms complexity

Concurrent Read & Concurrent Write

• CRCW

• Most powerful computer

• Very complex memory control

• Multiple processors can simultaneously read/write to a common memory block

Page 52: Algorithms complexity

Concurrent Write

Problem:

• Multiple processors writing different values to a common memory block every processor overwrites on previous processor’s value.

MemoryBlock

Processor 1

Processor 2

Processor 3

Page 53: Algorithms complexity

Concurrent Write

Solution1:

• Restrict Write – a unique value can only be written to the memory block.

1

Processor 1

Processor 2

Processor 3

1

1

1

Page 54: Algorithms complexity

Concurrent Write

Solution2:• Combine Write – a unique value is stored

for every distinct processor in the shared memory block.

1,2,4

Processor 1

Processor 2

Processor 3

1

2

4

Page 55: Algorithms complexity

Restrict Write

A good example of Restrict Write is a Boolean problem.

X1 X2 X3 Result

Page 56: Algorithms complexity

Restrict Write

X1 X2 X3 Result Initial value: Result = 0Only value one is written to Result

result = 0;

For i = 1 to n doip (do in parallel) {

if (Xi = = 1)

then result = 1;

}

Page 57: Algorithms complexity

Max Value - O(n2) Processors

Reminder:

One processor : O(n) operations.

O(n) processors : O(log2n) operations.

O(n2) processors : ?

We can represent the comparison between numbers as a matrix. If x1< x2 then coordinate (1,2) gets a value of one, else it gets a value of zero.

Page 58: Algorithms complexity

Max Value - O(n2) Processors

• A processor is allocated for each cell in the matrix.• All the processors with “value = 1” write

simultaneously to the result cell in their row.

X1

X2

X3

Result

(1,1) (1,2) (1,3)

(2,1) (2,2) (2,3)

(3,1) (3,2) (3,3)

X1 X2 X3

Row1

Row2

Row3

Page 59: Algorithms complexity

Max Value - O(n2) Processors

Total operations with O(n2) processors : O(1)– Generating the Matrix : O(1) operations

(one processor per cell)– Generating the result column : O(1) operations

3

6

4

Result

0 1 1

0 0 0

0 1 0

3 6 4

1

0

1

Max Value

Page 60: Algorithms complexity

Sort - O(n2) Processors

Reminder:

One processor : O(nlog2n) operations.

O(n) processors : O(log22n) operations (merge sort)

O(n2) processors : ?

• As before, we generate a comparison matrix.• The result cells will receive the sum of the current row.

Each row has O(n) processors, therefore the sum operation takes O(log2n) operations.

• The result column represents the index of the sorted array in descending order.

Page 61: Algorithms complexity

Sort - O(n2) Processors

Total operations with O(n2) processors : O(log2n)

– Generating the Matrix : O(1) operations

(one processor per cell)– Generating the result column : O(log2n) operations

3

6

4

Result

0 1 1

0 0 0

0 1 0

3 6 4

2

0

1

Page 62: Algorithms complexity

Multiplication Of Matrix

• Matrixes that can be multiplied must obeyed the dimension law : RnCm * RmCk

a11

a21

a12

a22

b11

b21

b12

b22

a11b11 + a12b21

a21b11 + a22b21

a11b12 + a12b22

a21b12 + a22b22

Page 63: Algorithms complexity

Multiplication Of Matrix

Input: Two matrixes of size n*n (Mnn)

Output: One matrix Mnn

Total operations with one processor : O(n3)

• n2 cells • Sum of each cell with O(n) variables and one

processor, O(n) operations

Page 64: Algorithms complexity

Multiplication Of Matrix

Total operations with o(n) processors : O(n2)• Processor per cell in a column. • n columns • Sum of each cell with O(n) variables and one

processor, O(n) operations

O(n)sum * ncolumn = O(n2)

Page 65: Algorithms complexity

Multiplication Of Matrix

Total operations with O(n2) processors : O(n)

• n2 cells

• Processor per cell

• Sum of each cell with O(n) variables and one processor, O(n) operations

O(n)sum * 1cell = O(n)

Each cell is summed simultaneously

Page 66: Algorithms complexity

Multiplication Of Matrix

Total operations with O(n3) processors : O(log2n)

• n2 cells

• O(n) processors per cell

• Sum of each cell with O(n) variables and O(n) processor, O(log2n) operations

O(log2n)sum * 1cell = O(log2n)

Each cell is summed simultaneously

Page 67: Algorithms complexity

Multiplication Of Boolean Matrix

Total operations with O(n3) processors : O(1)

• n2 cells

• O(n) processors per cell

• Sum of each cell with O(n) variables and O(n) processor, O(1) operations

O(1)sum * 1cell = O(1)

Each cell is summed simultaneously

Page 68: Algorithms complexity

Shortest Path Between Vertexes

Problem:• Finding if path exists between 2 vertexes• Finding the shortest path between 2

vertexes

1 1

11

V2

V1

V3

V4

Page 69: Algorithms complexity

Shortest Path Between Vertexes• Represent the graph as a matrix Ann. • If an arc exists between vertex X1 and X2, then coordinates

(1,2) & (2,1) get a value of one, otherwise zero.• Matrix Ann - all the vertexes that are of one arc distance from

each other.

V1

V2

V3

V4

1 0 1

0 1 0

1 0 1

0

1

0

0 1 0 1

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4

Page 70: Algorithms complexity

Shortest Path Between Vertexes

• Matrix Ann2 - all the vertexes that are of two arcs distance

from each other.

• Ann + Ann

2 = all routes of distance of one and two arcs.

V1

V2

V3

V4

2 0 2

0 2 0

2 0 2

0

2

0

0 2 0 2

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4

Page 71: Algorithms complexity

Shortest Path Between Vertexes

• Ann + Ann

2 + Ann3 + …Ann

n = B - all routes of distance 1 to n arcs.

• Any zero values in matrix B, represents no link exists between the two vertexes.

V1

V2

V3

V4

2 1 2

1 2 1

2 1 2

1

2

1

1 2 1 2

V1 V2 V3 V4

1 1

11

V2

V1

V3

V4

Page 72: Algorithms complexity

Shortest Path Between Vertexes

Total operations with 1 processors : O(n4) • Building of Matrix Ann : O(n) operations

• Multiplication of matrix : O(n3) operations

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(n4) operations

• Sum of the Matrixes : O(n3) operations

Page 73: Algorithms complexity

Shortest Path Between Vertexes

Total operations with O(n) processors : O(n3)

• Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(n2) operations

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(n3) operations

• Sum of the Matrixes : O(n2) operations (ncell * ncolumn)

Page 74: Algorithms complexity

Shortest Path Between Vertexes

Total operations with O(n2) processors: O(n2) • Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(n) operations

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(n2) operations

• Sum of the Matrixes : O(n) operations (process per cell)

Page 75: Algorithms complexity

Shortest Path Between Vertexes

Total operations with O(n3) processors: O(nlog2n)

• Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(log2n) operations

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(nlog2n) operations

• Sum of the Matrixes : O(log2n) operations (o(n)

processors per cell)

Page 76: Algorithms complexity

Shortest Path Between Vertexes

Total operations with O(n4) processors : O(log22n)

• Building of Matrix Ann : O(1) operations

• Multiplication of matrix : O(log2n) operations with O(n3) processors

• Creation of Ann,Ann

2 ,Ann3 , … ,Ann

n : O(log22n) operations (prefix

algorithm)

• Sum of the Matrixes : O(log2n) operations

• Boolean Output (link exist True or False) : O(log2n) operations