19
Data Structures for Disjoint Sets Course: CS 5130 - Advanced Data Structures and Algorithms Instructor: Dr. Badri Adhikari

for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Data Structures for Disjoint Sets

Course: CS 5130 - Advanced Data Structures and Algorithms Instructor: Dr. Badri Adhikari

Page 2: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

OverviewSome applications involve grouping n distinct elements into a collection of disjoint sets.

Two frequent operations on such applications:

(a) finding the unique set that contains a given element(b) uniting two sets

How can we maintain a data structure that supports these operations?

Two implementations:

(a) Linked list implementation of disjoint sets(b) Rooted trees implementation of disjoint sets

Page 3: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Disjoint set operationsA disjoint-set data structure maintains a collection S = {S1, S2, S3, ..., Sk} of disjoint dynamic sets.

Each set is identified by a representative, which is some member of the set.

- In some applications, it may not matter which member is used- In some applications, the smallest member- In some applications, a user selected member

Each element of a set is represented by an object x.

Page 4: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Disjoint set operationsMAKE-SET(x) - creates a new set whose only member (and thus representative) is x. Sets are disjoint - implies - x is not already in another set.

UNION(x, y) - unites the dynamic sets that CONTAIN x and y, say Sx and Sy, into a new set.

What will be the new representative? x or y? Where do we implement it?

FIND-SET(x) - returns a pointer to the representative of the (unique) set containing x.

Running times of disjoint-set data structures - depends on two parameters:

(a) n - the number of MAKE-SET operations(b) m - the total number of MAKE-SET, UNION, and FIND-SET operations.

Always, m ≥ n. Why?

Page 5: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Example application 1 - Reachability in Maze

Maze - Is B reachable from A?

https://www.coursera.org/learn/data-structures/

Page 6: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

preprocess (maze){for each cell c in maze:

MAKE-SET(c)for each cell c in maze:

for each neighbor n of c:UNION(c, n)

}

is-reachable(A, B){return FIND(A) = FIND(B)

}

Example application 1 - Reachability in Maze

Page 7: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Determine connected components in an undirected graph!

a graph with four connected components

SAME-COMPONENT(a, d)

SAME-COMPONENT(f, i)

Example application 2 - Connected Components

Page 8: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Example application 2 - Connected Components

disjoint sets afterprocessing eachedge at a time

Page 9: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Linked list representation of disjoint setsSay, S1 contains members f, g, and d with f as the representative member.

Each object in the list contains a set member, a pointer to the next object in the list, and a pointer back to the set object. Each object has pointers head and tail to the first and last objects.

MAKE-SET(x) - we create a new linked list whose only object is x.

FIND-SET(x) - we follow the pointer from from x back to its set object and then return the member of the object that the head points to. Example, FIND-SET(g) would return f.

MAKE-SET(x) and FIND-SET(x) both need O(1) time. How?

linked list representation of disjoint sets S1

Page 10: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

A simple implementation of UnionWe can perform UNION(x,y) by appending y’s list into the end of x’s list.

x’s representative becomes the resulting set’s representative.

Use the tail pointer of x’s list to quickly find where to append y’s list.

We must update the pointer to the set object for each object originally in y’s list -> takes linear time proportional to the length of y.

Example: UNION(g, e) causes pointer updates for c, h, e, b.

UNION(g, e)

If we did not have the pointers to head, the time for UNION would be very less. What is the downside?

Page 11: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Running time of the linked list implementationSuppose we have objects x1, x2, …, xn. We execute a sequence of n MAKE-SET operations followed by n-1 UNION operations, so that m = 2n -1.

[m - the total number of MAKE-SET, UNION, and FIND-SET operations]

Total time for n MAKE-SET operations = Θ(n)

ith UNION operation updates i objects, so the number of objects updated by all n-1 UNION operations is

So, each operation (total operations = 2n-1) , on average requires Θ(n).

A sequence of 2n-1 operations on n objects that takes Θ(n2) time, or Θ(n)

time per operation.

Page 12: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

A weighted-union heuristicIn the worst case, our implementation of the UNION procedure requires an average of Θ(n) time per call. Why? May be we are always appending a longer list onto a shorter list.

Solution: We maintain the length of the list along with each list

This way, we will always append a shorter list onto the longer.

With this simple weighted-union heuristic, a single UNION operation can still take Ω(n) time if both sets have Ω(n) members.

Overall, the total time spent in updating object pointers over all UNION operations is O(n lg (n)). i.e. each UNION operation on average takes O(lg(n)) time.

Each MAKE-SET and FIND-SET take O(1) time and there are total O(m) of them. Thus the total time for entire sequence is O(m+n lg(n)).

Page 13: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Disjoint-set forestsWe represent sets by rooted trees, with each node containing one member and each tree representing a set.Each member points only to its parent.The root of each tree contains the representative and is its own parent.MAKE-SET operation creates a tree with just one node.FIND-SET operation is following the parents pointer until we find the root of the tree. The nodes visited on this simple path towards the root constitute the find path.UNION operation causes the root of one tree to point to the root of the other.Algorithms that use this representation are no faster than the ones that use the linked-list representation.

Each MAKE-SET takes O(1) timeEach UNION takes O(1) timeEach FIND-SET can take anywhere from O(1) to O(n) time.(FIND-SET is the challenge here, compared to UNION in linked-list representation.)

UNION

Page 14: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Heuristics to improve running time - Union by Rank

Scenario: A sequence of n-1 UNION operations may create a tree that is just a linear chain of n nodes.

Similar to the weighted-union heuristic, we can make the root of the tree with fewer nodes point to the root of the tree with more nodes.

For each node, we maintain a rank, which is an upper bound on the height of the node.

We make the root with smaller rank point to the root with larger rank during a UNION operation.

This will improve the time required for each FIND-SET from O(n) to O(lg n). The total running time is O(m lg n) because for each MAKE-SET and UNION, we may have to run FIND-SET.

Page 15: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Heuristics to improve running time - Path compression

Path compression is simple and yet highly effective.

During the FIND-SET operations, make each node on the find path point directly to the root.

Path compression does not change any ranks.

What is the consequence?

Future FIND-SET operations take constant time.

Now, the total running time is O(m) and each operation, on average, takes almost constant time.

Path compression during the FIND-SET operation.Triangles are subtrees whose root nodes are shown.

Prior to executing FIND-SET(a)

After executing FIND-SET(a)

Page 16: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Pseudocode for disjoint-set forestsWith each node x, we maintain the integer value x.rank, which is an upper bound on the height of x. The parent of x is x.p.

MAKE-SET creates a singleton set, the single node in the corresponding tree has an initial rank 0.

Each FIND-SET operation leaves the ranks unchanged.

The FIND-SET procedure is a two-pass method: as it recurses, it makes one pass up the find path to find the root, and as the recursion unwinds, it makes a second pass back down the find path to update each node to point directly to the root.

Path compression implementation

Page 17: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Pseudocode for disjoint-set forestsThe UNION operation has two cases, depending on whether the roots of the trees have equal ranks.

If the roots have unequal ranks, we make the root with higher rank the parent root of the root with lower rank, but the rank themselves remain unchanged.

If the roots have equal ranks, we arbitrarily choose one of the roots as the parent and increment the rank.

x y

x yx y

Page 18: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

Classwork

Draw a linked-list representation and forest representation of the following disjoint-set graph:

S1 S2

S3

Page 19: for Disjoint Sets Data Structures - umsl.eduumsl.edu/~adhikarib/cs4130-fall2017/slides/04 - Data Structures for... · Data Structures for Disjoint Sets Course: CS 5130 - Advanced

SummaryDisjoint sets can be represented in two ways - using linked-list and using trees/forests.

With the basic linked-list implementation, with weighted-union heuristics, has total running time of O(m + n lg(n)).

With the ‘union by rank’ heuristic and the path compression heuristic, the disjoint-set forest implementation takes almost O(m) total running time.