steng/teaching/Fall2004/lectures/591... · Web viewGraph coloring itself is related to parallel algorithm design: if a graph is constructed in such a way that each node is a task

CS 591 Computational Geometry and Applications

Subscriber Tsykynovskyy ,Yevgen,A Lecturer Shang-Hua Teng

Review:

At the end of the last lecture we discussed models of parallel computation. The model involves multiple processors that can intercommunicate, e.g. by using shared memory. It was established that the transition from sequential algorithms to parallel algorithms is not always direct. The focus of designing parallel algorithms is to run as many tasks simultaneously as possible, thus reducing the critical path of the algorithm.

Some definitions of concepts and complexity classes

Independent set of a graph G=(V,E) is defined as follows:2 nodes are independent iff they are not directly connected. Collection of nodes is

an independent set iff every pair of nodes in the collection is independent.

It was discovered that the problem of finding a maximum independent set of a graph is NP-complete. That problem is connected to the problem of graph coloring – the set of nodes that is colored with the same color must be independent by definition of the graph coloring problem. Graph coloring itself is related to parallel algorithm design: if a graph is constructed in such a way that each node is a task and nodes are connected iff the tasks that they represent cannot be executed in parallel, then sets of nodes of the same color in the graph represent sets of tasks that can be done in parallel – the smaller the amount of colors you need to color the graph, the more tasks you can do in parallel.

In practice the notion of maximal independent set is used: a set is defined as maximal independent if it is independent and it will not stay independent if any other point from the graph is added. Note that cardinalities of maximal independent sets of the same graph can be different.

If nodes of the graph are labeled, a lexical-first maximal independent set can be computed. It is computed as follows: the lexically first node is included in the set, then we check if the second node is directly connected to the first one, and add it to the set if it is not connected. Each node is checked in the lexical order and is only added if it is not directly connected to any nodes in the existing version of the set. Such set can be computed in O(|V|+|E|) – linear time. The above algorithm is NC-complete. Class NC is a

http://www.bu.edu/link/bin/uiscgi_faculty.pl?ModuleName=sel_id_from_list.pl&StudentEntryKey=U97176352&Signature=897U36165GLEL7J324JF

class of problems that can be solved in logarithmic time with a polynomial number of processors. We do not know if P=NC.

An example of a P-complete problem (and a first one proven to be such) is the Circuit value problem. It involves computing the circuit constructed out of [and] and [or] gates and having one bit in the output layer.

Parallel algorithm for Delaunay triangulation refinement.

The input to the algorithm is a set of points.Then, an initial DT is computed on it. Then, while “bad” triangles exist, either a circumcenter of a triangle is inserted, or the boundary edge is split, and then the DT is updated.

Bad triangle is a triangle in which the radius/edge ration is too big.

Quad Tree (the concept was discussed in the previous lecture) quad trees can be computed in parallel by assigning a new processor to each new box in the tree. Then, parallel time will be bounded by the height of the tree - if the smallest feature size is S and the input size is L, then the height of the tree (and the parallel time) will be bounded by log(L/S).

Chew’s Algorithm

The main features of the DT are the circumcircles and the circles built on the borders of DT – because their centers are the candidates for insertion. We want to choose multiple bad circles for each iteration of point insertion to be done in parallel. At each step we will choose an independent subset of circles.

Two circumcircles are dependent if when one of the center points is inserted the other circumcircle’s triangle disappears.

The Algorithm (shortly):Point Set => DT1 (first Delaunay triangulation) => C1(circle system) => I1 (independent set of circles) =>DT2=>C2=>I2=>DT3=>C3=>I3=>… (Terminate)

To minimize the number of iterations, a maximal independent set of candidates for insertion should be chosen at each step.

Lemma: If Ca and Cb are two conflicting circumcircles at iteration i and Ra and Rb are their circumradii, then Rb/2<Ra<2Rb.

Proof: Circumcircle Ca contains Cb. The diameter of Ca must be greater than the radius of Cb, because some point of Ca should lie outside Cb. So, Ra>Rb/2. Symmetric proof works for 2Rb>Ra.

To prove that Chew’s parallel refinement algorithm works in logarithmic time, we will show that the radius of the largest Delaunay circle is reduced by a constant factor after a constant number of iterations. The following lemma does that.

Lemma: Let Ri be a radius of a largest Delaunay circle at the end of iteration i. Then, for all i, Ri+98<= 3Ri /4.

Proof:Let Ci be the largest circle in the i’th iteration with center Ci* and radius Ri. Suppose that Rk>3*Ri /4 for some i and k with k-i=98, which means that algorithm has not stopped yet. We will prove that there is a bad triangle with circumcircle Cj’, circumcenter Cj*’, and radius Rj’ near Ck*, for each iteration j, i<j<=k.Because vertices are never removed in the algorithm, an empty circle at iteration k was also empty in an earlier iteration.

Ck is dilated in three steps to locate Cj’ that passes through points Pj, Qj, Tj that exist during iteration j.

1) Dilate Ck while keeping its center fixed until it touches Pj.

2) Keep dilating it while keeping the tangency fixed at Pj and moving the center away from Pj, until it touches Qj.

3) Keep dilating, keeping touch points at Pj and Qj fixed, and moving the center away from PjQj, until it touches Tj.

Resulting Cj’ is a circumcircle of Delaunay triangle Pj Qj Tj at iteration j; the triangle is bad, because its circumradius is at least Rk. So, Cj*’ is a candidate vertex at j. Rj’ >= Rk>=3Ri /4, by construction.

Consider triangle Pj Ck* Cj*’ . It is non-acute at vertex Ck*. If x=|Ck* Pj| and y=|Ck* Cj*’| By cosine theorem, (Rj)^2 = x^2 + y^2 –2xy*cos( Cj*’ Ck* Pj). The Cj*’ Ck* Pj is non-acute, so (Rj)^2 >= x^2 + y^2. x >= Rk>3*Ri /4, thus Rj<x+Ri/4. So, (x+Ri/4)^2>= x^2 + y^2. Therefore y=|| Cj*’ – Ck*’|| <= 3Ri/4.So, Cj*’ must have been rejected during iteration j, otherwise Ck would not be empty at k’th iteration.

The previous lemma shows that there will be a center in conflict with Cj* - we will call it Cj*’’, its radius is not more that twice and no less than half of the radius of Cj’. So, Rj’’ is at least Rj’ /2 >= 3Ri /8. R’’j is no more that the radius of the largest Delaunay circle at iteration j, so it is no more that Ri. Therefore, Cj*’’ and Cj*’ are at most Ri away from each other. If we observe the set of circles inserted on iterations j from i+1 to k, we can see that the centers of the circles should be at least 3Ri/8 apart from each other, and there is a limited

region in which these circles reside. Therefore, we have a circle packing problem, and we can show that one can pack no more than 97 circles in that region.

The above lemma implies that parallel Chews refinement algorithm takes O(log(L/s)) steps, where L is the input diameter and s is the shortest edge on the output mesh. The lemma states that the diameter of the largest Delaunay circle shrinks at least by a factor of 3/4 every 98 iterations, and therefore the algorithm should terminate after at most 98log(base ¾, L/s) steps.

Theorem: Let P={P1 … Pi} be a collection of points in R2. Then, minimal spanning tree of P is a subgroup of DT(P).

This theorem implies that, using DT, we can compute MST in O(n*log(n))

Documents

steng/teaching/Fall2004/lectures/591... · Web viewGraph coloring itself is related to parallel algorithm design: if a graph is constructed in such a way that each node is a task