41
R-Trees A Dynamic Index Structure for Spatial Searching Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

Embed Size (px)

Citation preview

Page 1: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

R-Trees A Dynamic Index

Structure for Spatial Searching

Antonin GuttmanIn Proceedings of the 1984 ACM SIGMOD

international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

Page 2: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

2

Introduction R-Tree Index Structure Searching and Updating Performance Tests Conclusion

Outline

Page 3: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

3

Introduction

Background Previous Works

R-Tree Index Structure Searching and Updating Performance Tests Conclusion

Outline

Page 4: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

4

Motivation

To deal with spatial data efficiently Traditional database are for one-dimension data

Traditional Index Structure Hash Tables B Trees and ISAM

Background

Page 5: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

5

Previous Works

Method Disadvantage

Cell methods Not good for dynamic structures

Quad trees Do not take paging of secondary memory into accountK-D tree

K-D-B tree Useful only for point data

Corner Stltchmg

Homogeneous primary memoryNot efficient

Grid files

Page 6: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

6

Introduction R-Tree Index Structure

R-Tree Index Structure Properties of the R-Tree Example of a R-Tree

Searching and Updating Performance Tests Conclusion

Outline

Page 7: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

7

What is a R-tree

Height-balanced tree similar to a B-tree No need for doing periodic reorganization

What is the contents in the nodes (I, tuple-identifier) in leaf node (I, child-pointer) in non-leaf node

It must satisfy following properties

R-Tree Index Structure

Page 8: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

8

Let M be the maximum number of entries that

will fit in one node Let m <= M/2 be a parameter specifying the

minimum number of entries in a node

Properties of the R-Tree

Page 9: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

9

1. Every leaf node contains between m and M index records

unless it is the root2. For each index record(I, tuple-identifier) in a leaf node, I is

the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple

3. Every non-leaf node has between m and M children unless it is the root

4. For each entry(I, child-pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node

5. The root node has at least two children unless it is a leaf6. All leaves appear on the same level

Properties of the R-Tree

Page 10: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

10

Example of a R-Tree

Page 11: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

11

Introduction R-Tree Index Structure Searching and Updating

Searching Example of Searching Insertion Updates and Other Operations Node Splitting

Performance Tests Conclusion

Outline

Page 12: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

12

Problem definition

Give an R-Tree whose root node is T, find all index records whose rectangles overlap a search rectangle S

NotationsEI is the rectangle part of an index entry EEp is the tuple-identifier or child-pointer of an E

Searching

Page 13: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

13

Search(T, LIST) {

IF (T is not a leaf) FOR EACH (E in T) IF (E.EI overlaps S) Search(E.Ep);ELSE FOR EACH (E in T) IF (E.EI overlaps S) LIST.ADD(E.Ep);

}

Searching

Page 14: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

14

Example of Searching

Page 15: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

15

It is similar to insert a record in B-tree that new

record are added to the leaves, nodes that overflow are split, and splits propagate up the tree

Insert(T, E) { L = ChooseLeaf(T, E); INSTALL E; IF (L is full) { LL = SplitNode(L); AdjustTree(L, LL); }}

Insertion

Page 16: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

16

N ChooseLeaf(T, E) { SET N = T; IF (N is a non-leaf node) { find the F that F.FI needs least enlargement to include E.EI IN N SET N = F.Fp; ChooseLeaf(N, E); } ELSE return N;}

Insertion - ChooseLeaf()

Page 17: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

17

AdjustTree(L, LL) { SET N = L; SET NN = LL; IF (N is root) // check if done return; SET P = N.parent; SET En to be N’s entry in P ADJUST EnI so that it tightly encloses all entry rectangles in N IF (NN != NULL) { CREATE Enn; // Enn.p = NN, EnnI enclosing all rectangles in NN P.add(Enn); IF (P is full) { PP = SplitNode(P); AdjustTree(P, PP); } }}

Insertion - AdjustTree()

These three lines are for adjust covering rectangle in parent entry

Page 18: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

18

Remove index record E from an R-tree

Delete(T, E) { L = FindLeaf(T, E); IF (L != NULL) { Remove(E, L); // remove E from L CondenseTree(L); IF (root node has only one child) make the child the new root; }}

Deletion

Page 19: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

19

Given an R-tree whose root node is T, find the leaf node containing the

index entry E

T FindLeaf(T, E) { IF (T is not a leaf) { FOR EACH (F in T) { IF (FI overlaps EI) { T = FindLeaf(Fp, E); } } } IF (T is leaf) { FOR EACH (F in T) IF (F MATCH E) return T; }}

Deletion - FindLeaf()

Page 20: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

20

CondenseTree(L) {CT1: SET N = L; SET Q = empty; // the set of eliminated nodes.CT2: IF (N is root) { FOR EACH (E in Q) Insert(T, E); } ELSE { SET P = N.parent; SET En to be N’s entry in P;CT3: IF (N has fewer than m entries) { DELETE (En, P) // delete En from P Q.add(N); } ELSE {CT4: adjust EnI to tightly contain all entries in N;CT5: SET N = P; GOTO CT2; } }}

Deletion - CondenseTree()

Page 21: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

21

Update

Just perform deletion and re-insertion to do update

Other operations To find all data objects completely contained in

a search area, or all objects that contain a search area

Range deletion

Updates and Other Operations

Page 22: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

22

We need to perform node splitting when we

insert an entry into a full node The two covering rectangles after a split

should be minimized because it affect efficiency seriously

The are three different kind of splitting algorithms: exhaustive algorithm, quadratic-cost algorithm and linear-cost algoritym

Node Splitting

Page 23: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

23

It is the most straightforward approach To generate all possible groupings and choose

the best It most disadvantage is the high time

complexity, and reasonable value of M is 200(4096/4/(4+1))

Node Splitting- Exhaustive Algorithm

Page 24: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

24

It attempts to find a small-area split, but is not

guaranteed to find one with the smallest area possible

The cost is quadratic in M and linear in the number of dimensions

Process1. Pick first entry for each group2. Check if done3. Select entry to assign

Node Splitting - Quadratic-Cost

Algorithm

Page 25: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

25

Select two entries to be the first elements of the

groups Process

1. Calculate inefficiency of grouping entries together

2. Choose the most wasteful pair

Quadratic-Cost Algorithm PickSeeds()

Page 26: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

26

Select one remaining entry for classification in

a group Process

1. Determine cost of putting each entry in each group

2. Find entry with greatest preference for one group

Quadratic-Cost Algorithm PickNext()

Page 27: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

27

It is linear in M and in the number of

dimensions It is identical to Quadratic Split but used a

different version of PickSeed, PickNext Process

1. Find extreme rectangles along all dimensions2. Adjust for shape of the rectangle cluster3. Select the most extreme pair

Node Splitting – Linear-Cost Algorithm

Page 28: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

28

Introduction R-Tree Index Structure Searching and Updating Performance Tests

Performance Tests CPU Cost of Inserting Records CPU Cost of Deleting Records Search Performance Pages Touched Search Performance CPU Cost Space Efficiency

Second Series of Tests CPU Cost of Inserts and Deletes vs. Amount of Data Search Performance vs. Amount of Data Pages Touched Search Performance vs. Amount of Data CPU Cost Space Required for R-Tree vs. Amount of Data

Conclusion

Outline

Page 29: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

29

Implemented R-trees in C under Unix on a Vax

11/780 computer It purpose is to choose values for M and m, and

to evaluate different node-splitting algorithms Five page sizes were tested, corresponding to

different values of M Values tested for m were

M/2, M/3 and 2 All tests used

two-dimensional data

Performance Tests

Bytes per Page

Max Entries per Page(M)

128 6

256 12

512 25

1024 50

2048 102

Page 30: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

30

CPU Cost of Inserting Records

Page 31: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

31

CPU Cost of Deleting Records

Page 32: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

32

Search Performance Pages Touched

Page 33: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

33

Search Performance CPU Cost

Page 34: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

34

Space Efficiency

Page 35: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

35

It measured T-tree performance as a function

of the amount of data in the index The same sequence of test operations as

before was run on samples containing 1057, 2238, 3295, and 4559 rectangles

Parameters Linear algorithm with m = 2 Quadratic algorithm with m = M/3 Both with a page size of 1024 bytes(M=50)

Second Series of Tests

Page 36: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

36

CPU Cost of Inserts and Deletes vs. Amount of Data

Page 37: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

37

Search Performance vs. Amount of Data Pages

Touched

Page 38: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

38

Search Performance vs.Amount of Data CPU

Cost

Page 39: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

39

Space Required for R-Tree vs. Amount of Data

Page 40: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

40

Introduction R-Tree Index Structure Searching and Updating Performance Tests Conclusion

Outline

Page 41: Antonin Guttman In Proceedings of the 1984 ACM SIGMOD international conference on Management of data (SIGMOD '84). ACM, New York, NY, USA

41

Author proposed an useful index structure,

named R-tree, for multi-dimensional data Author also gave tree different splitting

algorithm, ran some tests on it, and concluded that linear node-split algorithm is the most efficient approach

R-tree would be easy to add to any relational database system

Conclusion