Upload
harry-alborn
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
On Spatial-Range Closest Pair Query
Jing Shan, Donghui Zhang and Betty Salzberg
College of Computer and Information Science Northeastern University
SSTD03 --- Santorini, Greece
Outline
Problem Definition Straightforward Approach Existing Technique Our Method Performance
SSTD03 --- Santorini, Greece
Problem Definition Given a spatial data set S, the Range Closest Pair query
regarding a spatial range R finds a pair of objects (s1, s2) with s1 and s2 R such that the distance between s1 and s2 is the smallest distance between two objects inside range R.
a b c
g
d
h i
f e
j
Query result is (e, f).
R
SSTD03 --- Santorini, Greece
Outline
Problem Definition Straightforward Approach Existing Technique Our Method Performance
SSTD03 --- Santorini, Greece
Straightforward Approach
1. Use an R-tree to select the objects in the query range.2. Find the closest pair by checking objects in the selection result.
We could do nested-loop; Or better approaches e.g. plane sweep with Voronoi diagram method
is O(n log n).
Problems: Have to access all data pages of R-tree which intersect the query range. Query range data may not fit in memory
SSTD03 --- Santorini, Greece
Note on Existing Techniques
[Hjaltason and Samet 98]: incremental join. [Corral, Manolopoulos, Theodoridis and Vassilakopoulos 00]: an
improved version, using pruning. They addressed a slightly different problem:
No query range. Joining two different R-trees.
Existing techniques do not perform well if there is overlap between the two R-trees. In case the two R-trees are identical, there is extensive overlap.
SSTD03 --- Santorini, Greece
MinDist
Given two MBRs A, B of R-tree nodes, MinDist(A, B) is the smallest distance between A and B boundaries.
object o1 A and o2 B, distance(o1, o2) MinDist(A, B).
MinDist
AB
SSTD03 --- Santorini, Greece
Existing Technique
1. T=; closestpair=NULL.
2. Push the pair of root entries into priority queue Q.
3. While Q is not empty
1. Pop (e1, e2) from Q whose MinDist is the smallest.
2. If e1 points to an index node,
For every child entry se1 in Node(e1) and child entry se2 in Node(e2)
If MinDist(se1, se2)<T, push (se1, se2) into Q.
3. Else /* e1 point a leaf node */
For every object o1 in Node(e1) and object o2 in Node(e2)
If distance(o1, o2)<T, update T=distance(o1,o2) and closestpair=(o1,o2) and remove pairs from Q with MinDist no smaller than T.
SSTD03 --- Santorini, Greece
a b c
g
d
h i
f e
Example
A
B
C D
a,b f,i c,e,g d,h
A B C D
R
(R,R)
T = ; closestpair=NULL
(A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD03 --- Santorini, Greece
a b c
g
d
h i
f e
Example
A
B
C D
a,b f,i c,e,g d,h
A B C D
R
(R,R)
T = distance(a, b); closestpair=(a, b)
(A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD03 --- Santorini, Greece
a b c
g
d
h i
f e
Example
A
B
C D
a,b f,i c,e,g d,h
A B C D
R
(R,R)
T = distance(f, e); closestpair=(f, e)
(A,A) (B,B) (C,C) (D,D) (A,C) (B,C) (A,B) (C,D) (A,D) (B,D)
SSTD03 --- Santorini, Greece
MinExistDist
MinDist
MinExistDist
AB
Given two MBRs A, B of R-tree nodes, MinExistDist(A, B) is the minimum distance which guarantees that there exists a pair of objects, one in A and the other in B, with distance closer than the metric.
object o1 A and o2 B, distance(o1, o2) MinExistDist(A, B). Usage [CMT+00]: if MinExistDist(A, B) is smaller than T, update T. This
can increase the chance of eliminating pairs from Q at early time.
SSTD03 --- Santorini, Greece
Involving a Query Range
MinDist
MinExistDist = ∞
MinDist
MinExistDist
We extend the MinExistDist…
SSTD03 --- Santorini, Greece
Outline
Problem Definition Straightforward Approach Existing Technique Our Method Performance
SSTD03 --- Santorini, Greece
Motivation for Our Method
The existing technique joins all self-pairs, e.g. (A,A), (B,B), … Reason: the MinDist of any self pair is 0. Challenge: is it possible to make it non-zero?
If MinDist(A,A) T, no need to process (A,A) !
We propose two ways to augment the R-tree with additional information. We call the augmented structures the Self-Range Closest-Pair Tree. In short, SRCP-tree.
SSTD03 --- Santorini, Greece
SRCP-tree (version 1)
Along with each index entry, store the closest pair of objects in the sub-tree.
Check the closest pair stored along with the root entry. If both objects are inside the query range R, return.
Along with each self pair to be pushed into Q, use the distance of the local closest pair (rather than 0) as the MinDist.
If we encounter an index entry where both objects in the closest pair are inside R, compare their distance with T. May decrease T.
SSTD03 --- Santorini, Greece
Insertion
When a new object o is inserted, only need to update the augmented information along the insertion path. (But need to visit subtrees.)
o
At each such entry, let the original local closest pair be (a,b). Needs to updated only if distance(o, o’) < distance (a,b) for some object o’ in the sub-tree.
(a,b)
distance (a,b)
o
SSTD03 --- Santorini, Greece
SRCP-tree (version 2)
Idea: while version 1 tries to avoid processing self pairs, version 2 of the structure tries to avoid processing sibling pairs.
E.g. if R has children A, B, C, D, version 1 cannot avoid pair (A,B), unless MinDist(A,B) T. Similarly, it has to process (A,C), (A,D), (B,C), (B,D), (C,D).
In version 2, every index entry e stores the “local-parent closest pair”: the closest pair between an object in the sub-tree pointed by e and an object in the sub-tree pointed by Parent(e).
E.g. along with A, we store the closest pair of objects (o1, o2), where o1 is in subtree(A) and o2 is in subtree(R).
Now, if the distance of object pair stored at A is no smaller than T, no need to process any pair involving A. Namely, (A,A), (A,B), (A,C), (A,D).
SSTD03 --- Santorini, Greece
Performance
Dell Pentium 4, 2.66GHz CPU XXL library, Java Both synthetic and real data:
uniform data (80,000 objects) US National Mapping Information (26,700 Massachusetts sites)
URL = http://mappings. usgs.gov/www/gnis/ Focus on query time.
SSTD03 --- Santorini, Greece
Small Query Range
1.00% 5.00% 10.00%
0
50
100
150
200
250
300
350
400
450
500
550
600
650
Incremental Join
SRCPV1
SRCPV2
query range percentage
time
(ms)
SSTD03 --- Santorini, Greece
Large Query Range
20.00%
40.00%
60.00%
80.00%
100.00%
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
Incremental Join
SRCPV1
SRCPV2
query range percentage
time
(ms)
SSTD03 --- Santorini, Greece
Conclusions
We have addressed the spatial closest pair query with query range. We have proposed two versions of an index structure called SRCP-tree. Our approaches have much better query performance than the existing
techniques, especially when the query range is large. In particular, version 2 of the SRCP-tree is universally the best.