4
CS 240 Tutorial 10 Notes Range Trees: Used for range searching multi-dimensional data, e.g., find all points (x, y) which satisfy x lo x x hi and y lo y y hi given some points and ranges [x lo ,x hi ], [y lo ,y hi ]. Brute-force method is to search all n points and check which lie in the range: O(n) cost. Range trees are a way of preprocessing the data to make such searches more efficient. If all the points lie in the range, even just outputting them takes Ω(n) time, so we can’t hope to do better than brute-force in the worst case. But we can do better in the case when the output size (say k) is significantly smaller than n. In 2 dimensions, a search in a range tree costs O(k + (log n) 2 ). 1 dimensional range trees are balanced binary search trees where the points are stored in the leaves. To allow binary search to work, the non-leaves contain the value of the largest leaf contained in the left child’s subtree. Example: Draw the perfectly balanced 1-dimensional range trees on the x and y coordinates of the points (1, 8), (2, 6), (3, 4), (5, 7). Answer: 2 3 5 3 1 2 1 6 7 8 7 4 6 4 Example: Search for the range [2.5, 6.5] in both trees. Answer: Bold nodes represent those on the search paths; underlined nodes represent those which are in range: 2 3 5 3 1 2 1 6 7 8 7 4 6 4 In general, you want to output all the leaves which lie between the search paths, and possibly the two leaves which lie on the search paths; check if these points lie in range manually. (Explain how pseudocode works.) A range tree on two dimensions x and y is a range tree on x with leafs containing the points (x, y) and non-leafs containing a pointer to a range tree on y (only using the points which appear as descendants of the non-leaf). 1

cs240tut10

Embed Size (px)

DESCRIPTION

cs240tut

Citation preview

  • CS 240 Tutorial 10 NotesRange Trees: Used for range searching multi-dimensional data, e.g., find all points (x, y) which satisfy

    xlo x xhi and ylo y yhi

    given some points and ranges [xlo, xhi], [ylo, yhi].

    Brute-force method is to search all n points and check which lie in the range: O(n) cost.

    Range trees are a way of preprocessing the data to make such searches more efficient.

    If all the points lie in the range, even just outputting them takes (n) time, so we cant hope to dobetter than brute-force in the worst case.

    But we can do better in the case when the output size (say k) is significantly smaller than n.

    In 2 dimensions, a search in a range tree costs O(k + (log n)2).

    1 dimensional range trees are balanced binary search trees where the points are stored in the leaves. To allowbinary search to work, the non-leaves contain the value of the largest leaf contained in the left childs subtree.

    Example: Draw the perfectly balanced 1-dimensional range trees on the x and y coordinates of the points

    (1, 8), (2, 6), (3, 4), (5, 7).

    Answer:

    2

    3

    53

    1

    21

    6

    7

    87

    4

    64

    Example: Search for the range [2.5, 6.5] in both trees.Answer: Bold nodes represent those on the search paths; underlined nodes represent those which are inrange:

    2

    3

    53

    1

    21

    6

    7

    87

    4

    64

    In general, you want to output all the leaves which lie between the search paths, and possibly the two leaveswhich lie on the search paths; check if these points lie in range manually. (Explain how pseudocode works.)

    A range tree on two dimensions x and y is a range tree on x with leafs containing the points (x, y) andnon-leafs containing a pointer to a range tree on y (only using the points which appear as descendants of thenon-leaf).

    1

  • Example: Draw the perfectly balanced 2-dimensional range tree using the previous points.Answer:

    2

    3

    (5, 7)(3, 4)

    1

    (2, 6)(1, 8)

    6

    7(1, 8)

    (5, 7)

    4(2, 6)

    (3, 4)4(5, 7)

    (3, 4)

    6(1, 8)

    (2, 6)

    Range search works the same way as in the 1-dimensional case, except instead of just printing the leafs whichappear between the search paths, you start searching in the associated trees which lie between the searchpaths using the next pair of range values.

    Pseudocode for 1-dimensional range search:

    rangesearch(node, lo, hi)

    loop

    if node is a leaf

    print node.value if lo

  • Note: Part of the definition of a range tree is that it is balanced, e.g., an AVL tree. This complicates addingnew points into a range tree. For example:

    2

    3

    (4, 5)(3, 4)

    (2, 3)add (5, 6)

    2

    3

    4

    (5, 6)(4, 5)

    (3, 4)

    (2, 3)rotate

    3

    4

    (5, 6)(4, 5)

    2

    (3, 4)(2, 3)

    And now the tree associated with 3 has to be set to the tree previously associated with 2, and the treeassociated with 2 has to contain (2, 3) and (3, 4).

    In general,

    2

    3

    4

    DC

    B

    Arotate

    3

    4

    DC

    2

    BA

    and the tree associated with 2 has to be set to the tree associated with A with the points in B added.

    A3Q1. (a) Find a way to sort an array A[1..n] with O(log n) distinct elements in O(n log log n) time.Idea: If one could find the frequency with which each element appears, then one could just sort the O(log n)distinct elements and add enough copies of the elements to achieve the proper frequency.

    How to find the correct frequencies? Easiest thing is to store them in an associative array, loop through Aand for each item encountered, increment its corresponding frequency.

    Using an unordered array, finding the counter to increment costs O(log n) (the length of the associative array)and there are O(n) increments, so this costs O(n log n).

    Using an ordered array, finding the counter to increment costs O(log log n), so total incrementing cost isO(n log log n). If the item hasnt been added to the array yet the cost is O(log n), but that only happensO(log n) times.

    Sorting the distinct elements costs O(log n log log n), and outputting the final sorted answer with correctfrequency costs O(n). Total cost: O(n log log n).

    Note: An AVL tree can also be used to implement the associative array.

    Alternate Idea: Use Quicksort with a 3-way partition[< pivot | = pivot | > pivot

    ]and pivoting on the median.

    However, this is harder to analyze, and you need to know how to compute medians in linear time.

    (b) Sorting arrays with many duplicates is a special case of the sorting problem, and the (n log n) boundonly applies to sorting in the general case. If you have extra information about what you are trying to sortyou can possibly beat the lower bound, as in this case.

    3

  • A3Q2. (a) Pseudocode for finding the height of a binary tree:

    height(node)

    if node is empty then

    return 0

    else

    return 1 + max(height(node.left), height(node.right))

    Cost: height is called on every node in the tree, so cost is (n).Also, height is called exactly once for every node and empty child in the tree. Every node has at most 2empty children, so there are O(n) total height calls, each costing O(1).Thus total cost is (n).

    (b) Pseudocode for finding the height of an AVL tree:

    height(node)

    if node is empty then

    return 0

    else if node.balance