37
Interval Trees

Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Embed Size (px)

Citation preview

Page 1: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Interval Trees

Page 2: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Interval Trees

• Useful for representing a set of intervals– E.g.: time intervals of various events

• Each interval i has a low[i] and a high[i]– Assume close intervals

Page 3: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Interval Properties

• Intervals i and j overlap iff:

low[i] ≤ high[j] and low[j] ≤ high[i]

• Intervals i and j do not overlap iff:

high[i] < low[j] or high[j] < low[i]

i

ji

j

i

j

i

j

i j j i

Page 4: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Interval Trichotomy

• Any two intervals i and j satisfy the interval

trichotomy:

– exactly one of the following three properties holds:

a) i and j overlap

• low[i] ≤ high[j] and low[j] ≤ high[i]

b) i is to the left of j

• high[i] < low[j]

c) i is to the right of j

• high[j] < low[i]

Page 5: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Interval Trees

Def.: Interval tree = a red-black tree that

maintains a dynamic set of elements, each

element x having associated an interval int[x].

• Operations on interval trees:

– INTERVAL-INSERT(T, x)

– INTERVAL-DELETE(T, x)

– INTERVAL-SEARCH(T, i)

Page 6: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Designing Interval Trees1. Underlying data structure

– Red-black trees– Each node x contains: an interval int[x], and the key:

low[int[x]]– An inorder tree walk will list intervals sorted by their low endpoint

2. Additional information– max[x] = maximum endpoint value in subtree rooted at x

3. Maintaining the information

max[x] =

Constant work at each node, so still O(lgn) time

high[int[x]]max max[left[x]]

max[right[x]]

[16, 21]

30

[25, 30]

30

[26, 26]

26

[17, 19]

20

[19, 20]

20

[8, 9]

23

[15, 23]

23

[5, 8]

10

[6, 10]

10

[0, 3]

3

highlow

Page 7: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Designing Interval Trees

4. Develop new operations• INTERVAL-SEARCH(T, i):

– Returns a pointer to an element x in the interval tree T, such that int[x] overlaps with i, or NIL otherwise

• Idea: Check if int[x] overlaps with i– Max[left[x]] ≥ low[i]

• Go left

– Otherwise, go right [16, 21]

30

[25, 30]

30

[26, 26]

26

[17, 19]

20

[19, 20]

20

[8, 9]

23

[15, 23]

23

[5, 8]

10

[6, 10]

10

[0, 3]

3

highlow

Page 8: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

INTERVAL-SEARCH(T, i)

1. x ← root[T]

2. while x nil[T] and i does not overlap int[x]

3. do if left[x] nil[T] and

max[left[x]] ≥ low[i]

4. then x ← left[x]

5. else x ← right[x]

6. return x

Page 9: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Example

[16, 21]

30

[25, 30]

30

[26, 26]

26

[17, 19]

20

[19, 20]

20

[8, 9]

23

[15, 23]

23

[5, 8]

10

[6, 10]

10

[0, 3]

3

i = [11, 14] x

xx

x = NIL

i = [22, 25]

Page 10: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Fibonacci Heap

• Fibonacci Heaps:a collection of min-heap ordered trees.

trees: rooted but unordered

Each node x: x.p points to its parent

x.child points to any one of its children

children of x are linked together in a circular doubly linked list

x.left, x.right: points to its left and right siblings.

x.degree: number of children in the child list of x

x.mark: indicate whether node x has lost a child since the last time x was made the child of another node

H.min: points to the root of the tree containing a minimum key

H.n: number of nodes in H

Page 11: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Fibonacci Heap

(a) 23 7

4139

24

30

17

3818 52 26

3

46

35

H.min

(b) 23 7

4139

24

30

17

3818 52 26

3

46

35

H.min

Page 12: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Binomial Trees

• A binomial heap is a collection of binomial trees.• The binomial tree Bk is an ordered tree defined

recursivelyBo Consists of a single node...Bk Consists of two binominal trees Bk-1 linked together. Root of one is the leftmost child of the root of the other.

Page 13: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Binomial Trees

B k-1

B k-1

B k

Page 14: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Binomial Trees

B0 B1 B2 B3

B1

B1

B2

B1

B4

B0

B3 B2

B1

B0

Page 15: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Binomial Trees

Bk-2

Bk-1

B2

B1Bo

Bk

Page 16: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Properties of Binomial Trees

For the binomial tree Bk ;1. There are 2k nodes,2. The height of tree is k,3. There are exactly C(k, i) nodes at depth i for i = 0, 1,..,k

and4. The root has degree k > degree of any other node if the

children of the root are numbered from left to right as k-1, k-2,...,0; child i is the root of a subtree Bi.

Page 17: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Binomial Heaps

Example: A binomial heap with n = 13 nodes 3 2 1 0

13 =< 1, 1, 0, 1>2

Consists of B0, B2, B3

head[H] 10

25

1

12

18

6

298

38

14

27

1711

B

0

B

2 B

3

Page 18: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Branch & Bound Algorithms

Page 19: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Introduction

• The branch-and-bound design strategy is very similar to backtracking in that a state space tree is used to solve a problem.

• The differences are that the branch-and-bound method (1) does not limit us to any particular way of traversing the tree, and (2) is used only for optimization problems.

• A branch-and-bound algorithm computes a number (bound) at a node to determine whether the node is promising.

Page 20: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Introduction …

• The number is a bound on the value of the solution that could be obtained by expanding beyond the node.

• If that bound is no better than the value of the best solution found so far, the node is nonpromising. Otherwise, it is promising.

• The backtracking algorithm for the 0-1 Knapsack problem is actually a branch-and-bound algorithm.

• A backtracking algorithm, however, does not exploit the real advantage of using branch-and-bound.

Page 21: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Introduction …

• Besides using the bound to determine whether a node is promising, we can compare the bounds of promising nodes and visit the children of the one with the best bound.

• This approach is called best-first search with branch-and-bound pruning. The implementation of this approach is a modification of the breadth-first search with branch-and-bound pruning.

Page 22: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Branch and Bound

• An enhancement of backtracking• Applicable to optimization problems• Uses a lower bound for the value of the

objective function for each node (partial solution) so as to:– guide the search through state-space– rule out certain branches as

“unpromising”

Page 23: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Sum of subsets• Problem: Given n positive integers w1, ... wn

and a positive integer S. Find all subsets of w1, ... wn that sum to S.

• Example: n=3, S=6, and w1=2, w2=4, w3=6

• Solutions: {2,4} and {6}

Page 24: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Augmenting Data Structures When dealing with a new problem

• Data structures play important role

• Must design or addopt a data structure. Only in rare situtations

• We need to create an entirely new type of data structure.

More Often

• It suffices to augment a known data structure by storing additional information.

• Then we can program new operations for the data structure to support the desired application

Page 25: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Augmenting Data Structures (2)

Not Always Easy

• Because, added info must be updated and maintained by the ordinary operations on the data structure.

Operations

• Augmented data structure (ADS) has operations inherited from underlying data structure (UDS).

• UDS Read/Query operations are not a problem. (ie. Min-Heap Minimum Query)

• UDS Modify operations should update additional information without adding too much cost. (ie. Min-Heap Extract Min, Decrease Key)

Page 26: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Dynamic Order Statistics Example problem;

• Dynamic Order Statistics, where we need two operations;

• OS-SELECT(x,i): returns ith smallest key in subtree rooted at x

• OS-RANK(T,x): returns rank (position) of x in sorted (linear) order of tree T.

Other operations

• Query: Search, Min, Max, Successor, Predecessor

• Modify: Insert, Delete

Page 27: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Dynamic Order Statistics (2)

Sorted or linear order of a binary search tree T is determined by inorder tree walk of T.

• IDEA:

• Use Red-Black (R-B) tree as the underlying data structure.

• Keep subtree size in nodes as additional information.

Page 28: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Dynamic Order Statistics (3)

Relation Between Subtree Sizes;

• size[x] = size[left[x]] + size[right[x]] + 1

Note on implementation;

• For convenience use sentinel NIL[T] such that;

• size[NIL[T]] = 0

• Since high level languages do not have operations on NIL values. (ie. Java has NullPointerException)

The node itself

Page 29: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Dynamic Order Statistics Notation

Node Structure:

• Key as with any Binary Search Tree (Tree is indexed according to key)

• Subtree Size as additional Data on Node.

KEY

SUBTREE SIZE

Page 30: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Amortized Analysis

• Not just consider one operation, but a sequence of operations on a given data structure.

• Average cost over a sequence of operations.• Probabilistic analysis:

– Average case running time: average over all possible inputs for one algorithm (operation).

– If using probability, called expected running time. • Amortized analysis:

– No involvement of probability– Average performance on a sequence of operations, even

some operation is expensive.– Guarantee average performance of each operation

among the sequence in worst case.

Page 31: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Three Methods of Amortized Analysis

• Aggregate analysis:– Total cost of n operations/n,

• Accounting method:– Assign each type of operation an (different) amortized

cost– overcharge some operations, – store the overcharge as credit on specific objects, – then use the credit for compensation for some later

operations.• Potential method:

– Same as accounting method– But store the credit as “potential energy” and as a whole.

Page 32: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Example for amortized analysis

• Stack operations:– PUSH(S,x), O(1)– POP(S), O(1)– MULTIPOP(S,k), min(s,k)

• while not STACK-EMPTY(S) and k>0• do POP(S)• k=k-1

• Let us consider a sequence of n PUSH, POP, MULTIPOP.– The worst case cost for MULTIPOP in the sequence is

O(n), since the stack size is at most n.

Page 33: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Aggregate Analysis • In fact, a sequence of n operations on an initially

empty stack cost at most O(n). Why?

Each object can be POP only once (including in MULTIPOP) for each time it is PUSHed. #POPs is at most #PUSHs, which is at most n.

Thus the average cost of an operation is O(n)/n = O(1).

Amortized cost in aggregate analysis is defined to be average cost.

Page 34: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

Amortized Analysis: Accounting Method

• Idea:– Assign differing charges to different operations.– The amount of the charge is called amortized cost.– amortized cost is more or less than actual cost.– When amortized cost > actual cost, the difference is

saved in specific objects as credits.– The credits can be used by later operations whose

amortized cost < actual cost.• As a comparison, in aggregate analysis, all

operations have same amortized costs.

Page 35: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

The Potential Method

• Same as accounting method: something prepaid is used later.

• Different from accounting method– The prepaid work not as credit, but as “potential

energy”, or “potential”.– The potential is associated with the data structure

as a whole rather than with specific objects within the data structure.

Page 36: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

The Potential Method (cont.)

• Initial data structure D0,

• n operations, resulting in D0, D1,…, Dn with costs c1, c2,…, cn.

• A potential function : {Di} R (real numbers)

• (Di) is called the potential of Di.

• Amortized cost ci' of the ith operation is:– ci' = ci + (Di) - (Di-1). (actual cost + potential change)

• i=1n ci' = i=1

n (ci + (Di) - (Di-1))

• = i=1nci + (Dn) - (D0)

Page 37: Interval Trees. Useful for representing a set of intervals –E.g.: time intervals of various events Each interval i has a low[i] and a high[i] –Assume

The Potential Method (cont.)

• If (Dn) (D0), then total amortized cost is an upper bound of total actual cost.

• But we do not know how many operations, so (Di) (D0) is required for any i.

• It is convenient to define (D0)=0,and so (Di) 0, for all i.• If the potential change is positive (i.e., (Di) - (Di-1)>0),

then ci' is an overcharge (so store the increase as potential),

• otherwise, undercharge (discharge the potential to pay the actual cost).