View
11
Download
0
Category
Preview:
Citation preview
1
C. Shahabi
CSCI585CSCI585
Spatial Index Structures
Cyrus ShahabiComputer Science Department
University of Southern Californiashahabi@usc.edu
2
C. Shahabi
CSCI585CSCI585 Outline
IntroductionSpatial IndexingR-TreeR+-TreeQuad Trees
2
3
C. Shahabi
CSCI585CSCI585 Introduction
Spatial objects– Points, lines, rectangles, regions, …
Hierarchical data structures– Based on recursive decomposition, similar to divide and
conquer method, like B-tree.Why not B-Tree?– More than one dimension– Concept of closeness relies on all the dimensions of the
spatial dataSpatial index structures demo– http://www.cs.umd.edu/~brabec/quadtree/index.html
4
C. Shahabi
CSCI585CSCI585 Spatial Indexing
Mapping spatial object into point– In either same, lower, or higher dimensional spaces– Good for storage purposes– Problems with queries like finding the nearest objects
Bucketing methods– Based on spatial occupancy– Decomposing the space from which the data is drawn
• Minimum bounding rectangle (MBR) : e.g., R-Tree• Disjoint cells: e.g., R+-Tree• Blocks of uniform size• Distribution of the data: e.g., quadtree
data-dependent
greater degree of data-independence
3
5
C. Shahabi
CSCI585CSCI585 R-Tree[proposed in 1984 by Guttman]
Based on Minimum Bounding Rectangle (m,M)=(1,3)
R4
d g h
R3
R1
a b
a b
dg
h
R2
R5 R6
c i e fc
ie
f
�bounding rectangles could overlap each others (e.g., R3 vs R4)�an object is only associated with one bounding rectangle
6
C. Shahabi
CSCI585CSCI585 R-Tree[proposed in 1984 by Guttman]
Height-balanced tree similar to B-tree for k-dimensionsEvery leaf node contains between m (m ≤M/2) and M index records, unless it is the rootFor each index record (I, tuple-identifier) in a leaf node, I is the MBR that contains the n-dimensional data object represented by the indicated tupleEvery non-leaf node has between m and M children unless it is the rootFor each entry (I, child-pointer) in a non-leaf node, I is the MBR that spatially contains the rectangles in the child node.All leaves appear on the same levelThe root node has at least two children unless it is a leaf
4
7
C. Shahabi
CSCI585CSCI585 Insertion Processes
d
ef
i
gm
A
Bh
k lj
CX
A new index entry
X
ChooseLeaf
Has room
Install X
Yes
SplitNode
No
C
AdjustTree
C Adjust all related entries
Different variant:•Exhaustive•Quadratic•Linear•Packed•Hilbert Packed•…etc.
8
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each groupRun PickSeeds d
ef
i
gm
A
Bh
k lj
CX
d e f g h i j k l m
CBA
5
9
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
d
ef
i
gm
A
Bh
k lj
CX
d e f g h i j k l m
CBA
PickSeedsPS1 [Calculate inefficiency of grouping entries together]For each pair of E1 and E2, compose a rectangle R including E1 and E2Calculate d = area(R) - area(E1) - area(E2)
PS2 [Choose the most wasteful pair ] Choose the pair with the largest d
lj
k ll
Xm
l
10
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
d e f g h i j k l m
CBA
Check if done
G1 G2l m
Select entry to assign(PickNext)
No
i
gm
d
ef
A
Bh
k lj
G1
XG2
6
11
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
i
gm
d
ef
A
Bh
k lj
G1
X
Check if done
G1 G2l m
PickNextPN1 [Determine cost of putting each entry in each group] For each entry Ecalculate d1 = the increased MBR area required for G1 calculate d2 = the increased MBR area required for G2 PN2 [Find entry with greatest preference for one group]Choose the entry with the maximum difference between d1 and d2
No
G2
G1
G2
kG2
G1m
X
G2
G1j
12
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
d e f g h i j k l m
CBA
Check if done
G1 G2l m
Select entry to assign(PickNext)
No
G1 G2l mk
i
gm
d
ef
A
Bh
k lj
G1
XG2
7
13
C. Shahabi
CSCI585CSCI585
G2G2G1
G1
Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
Check if done
G1 G2l m
Select entry to assign(PickNext)
No
G1 G2l mk
i
gm
d
ef
A
Bh
k lj
G1
XG2
jX
14
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
Check if done
G1 G2l m
Select entry to assign(PickNext)
No
i
gm
d
ef
A
Bh
k lj
G1
XG2
G1 G2l mk X
8
15
C. Shahabi
CSCI585CSCI585 Processes of Quadratic Spilt(page 52 in Guttman’s paper)
Pick first entry for each group(PickSeeds)
Check if done
G1 G2l m
Select entry to assign(PickNext)
No
i
gm
d
ef
A
Bh
k lj
G1
XG2
G1 G2l mk X
j
d e f g h i l k
G2G1BA
m x j
16
C. Shahabi
CSCI585CSCI585 Your Exercise Before Midterm(m,M)=(2,4)
Build a R-Tree for these spatial dataHint: You could use the Spatial index structures demo application step by step
DF G
E HJ
K
L
9
17
C. Shahabi
CSCI585CSCI585 Search Object in R-Tree
i
g m
d
ef
A
Bh
k lj
G1
xG2
d e f g h i l k
G2G1BA
m x j
Input (T, j)
Leaf?
Yes
Search each subtree E of T
No
Overlap
(E,j)
Yes
Check each entry E of T
Overlap
Qualifyingrecord
Yes
18
C. Shahabi
CSCI585CSCI585 Main Drawbacks of R-Tree
R-tree is not unique, rectangles depend on how objects are inserted and deleted from the tree.In order to search some object you might have to go through several rectangles or the whole database– Why?– Solution?
10
19
C. Shahabi
CSCI585CSCI585 R+-Tree
Overcome problems with R-TreeIf node overlaps with several rectangles insert the node in allDecompose the space into disjoint cells
R2
R3 R4 R5 R6
R1
a bd g h c i e fh i c i
g
d
h
ci
f
e
b
20
C. Shahabi
CSCI585CSCI585 R+-Tree PropertiesR+-tree and cell-trees used approach of discomposing space into cells– R+-trees deals with collection of objects bounded by rectangles– Cell tree deals with collection of objects bounded by convex
polyhedronR+-trees is extension of k-d-B-treeRetrieval times are smallerWhen summing the objects, needs eliminate duplicatesAgain, it is data-dependent
11
21
C. Shahabi
CSCI585CSCI585 Quad Trees
Region Quadtree– The blocks are required to be disjoint– Have standard sizes (squares whose sides are power of two)– At standard locations– Based on successive subdivision of image array
into four equal-size quadrants– If the region does not cover the entire array, subdivide into
quadrants, sub-quadrants, etc.– A variable resolution data structure
22
C. Shahabi
CSCI585CSCI585 Example of Region Quadtree
12 3
4 5
67 8
9 10
11 12
13 14
15 16
17 1819
A
B C E
2
1
3 4 5 6 11 12D 13 14 19F
15 16 17 187 8 9 10
NW NE SW SE
3
311
11
12
23
C. Shahabi
CSCI585CSCI585 PR Quadtree
PR (Point-Region) quadtreeRegular decomposition (similar to Region quadtree)Independent of the order in which data points are inserted into it�: if two points are very close, decomposition can be very deep
24
C. Shahabi
CSCI585CSCI585 Example of PR Quadtree
(0,0) (100,0)
(100,100)(0,100)
Seattle(1,55)
Toronto(62,77)
Buffalo(82,65)
Denver(5,45)
Chicago(35,42)
Omaha(27,35) Mobile
(52,10)
Atlanta(85,15)
Miami(90,5)
A
Seattle(1,55)
Toronto(62,77)
Buffalo(82,65)
Denver(5,45)
Chicago(35,42)
Omaha(27,35)
Mobile(52,10)
Atlanta(85,15)
Miami(90,5)
B C E
D F
Subdivide into quadrants until the two points are located in different regions
13
25
C. Shahabi
CSCI585CSCI585 PM Quadtree
PM (Polygonal-Map) quadtree family– PM1 quadtree, PM2 quadtree, PM3 quadtree, PMR quadtree, … etc.
PM1 quadtree– Based on regular decomposition of space– Vertex-based implementation– Criteria
• At most one vertex can lie in a region represented by a quadtree leaf• If a region contains a vertex, it can contain no partial-edge that does not
include that vertex• If a region contains no vertices, it can contain at most one partial-edge
26
C. Shahabi
CSCI585CSCI585 PM Quadtree
PM1 quadtree
PM2 quadtree
PM3 quadtree
14
27
C. Shahabi
CSCI585CSCI585 Example of PM1 Quadtree
(0,0) (100,0)
(100,100)(0,100) Each node in a PM quadtreeis a collection of partial edges (and a vertex)Each point record has two field (x,y)Each partial edge has four field (starting_point, ending_point, left region, right region)
28
C. Shahabi
CSCI585CSCI585 Question?
Question?
15
29
C. Shahabi
CSCI585CSCI585 B-Tree: Definition
A B-Tree of order M is a height-balanced tree 1. All leaves are on the same level2. All nodes have at most M children3. All internal nodes except the root have at least M/2 children
=m
30
12 15 22
11
51
35 41
66 78
2 7 54 55 60 68 71 75 89 95100Back to R+ Tree
30
C. Shahabi
CSCI585CSCI585 B-Tree: InsertionA new index entry
ChooseLeaf
Has room
Install X
Yes
SplitNode
No
AdjustTreeback
Recommended