Upload
carl-koch
View
25
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Optimal Planar Point Enclosure Indexing. Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete. Two Dual Problems. Range searching. Point enclosure. √. √. Internal memory External memory. √. ?. Outline. - PowerPoint PPT Presentation
Citation preview
Optimal Planar Point Enclosure Indexing
Lars Arge, Vasilis Samoladas and Ke Yi
Department of Computer Science
Duke University
Technical University of Crete
3
Outline
1. Previous results in internal memory
2. Computation models in external memory
3. Previous results in external memory
4. Our lower bound result
5. Matching upper bound
6. Conclusions
4
Previous Results: Internal Memory• Computation model: Pointer machine
• Range searching (T is the output size)
– O(N) space, O(Nε+T) time ([BM 80])
– O(N logN / loglogN) space, O(logN+T) time [Chazelle 88]
* Tight for O(logcN+T) query structures, [Chazelle 90]
– Can do better on a RAM
– Other tradeoffs …
• Point enclosure [Chazelle 86]
– Ө(N) space, Ө(logN+T) time
– Optimal in both space and time
5
External Memory: Models• External pointer machine
– Natural generalization of the internal pointer machine
– Each node contains B data objects
– Out-degree 2 →B
• Bounding-volume hierarchy (Non-replicating index structure)
– Tree structure
– Each object is stored only once
• Indexability model [HKP 97]
D
P
M
Block I/O
6
External Memory: Models• Indexability model
– No “structure” at all!
– Only models layout of data
– Each block contains B data objects
– Can “magically” find the smallest set Π of blocks whose union contains all results
– Cost is defined to be |Π|
Bounding volume hierarchyR-trees, kd-trees
External pointer machineAll other known results
Indexability model
1D range searching
7
• Range searching (n=N/B)
– Similar to internal memory, tradeoff between space and time
– O(logBn+T/B) query time
* O(n log n / loglogBn) space [ASV 99]
– Tight in external pointer machine [SR 95]
– Improved to indexability model [ASV 99]
– O(n) space
* O( ) time [kdB-tree, GI 99, KS 99]
– Tight in bounding-volume hierarchies
* Can do O(nε+T/B) with constant redundancy
– Tight in indexability model [ASV 99]
Previous Results: External Memory
BTn /
8
Previous Results: External Memory• Point enclosure
– Ω( ) for bounding-volume hierarchies [ABGHH 01]
– Easy to get a O(n) space, O(log2n+T/B) query structure
BTn /
Problem Internal memory External memory
1D range (N, log N + T) (n, logBn + T/B)
1D point enclosure
(N, log N + T) (n, logBn + T/B)
2D range
(N, Nε+T) (n, nε+T/B)
2D point enclosure
(N, log N + T)
TN
N
NN log,
loglog
log
TN
N
Nn B
B
log,loglog
log
B2(n, log n + T/B)?
(n, log n+T/B)
(nBε, logBn+T/B)
9
Indexability Model in Details• N data objects laid out in disk blocks, possibly with redundancy
• Each block holds at most B objects
• Cost of a query q: minimum # blocks needed to retrieve all answers
– Can find those blocks without cost
• Redundancy r and access overhead A
– r: Average # copies in the index
* Size is rn blocks
– A: Ratio of the query cost to the ideal cost in the worst case
* Any query can be covered by blocks (A ≤ B)
• Lower bound expressed as a tradeoff between r and A
– 2D range searching [ASV 99]
Bq /||
BqA /||
)log/(log Anr
10
Previous Results in Indexability Model• Set queries [HKP 97]
– A set S of N objects, queries can be any subset of S
– For any r≤n/B, A=B
– Trivial
• Range searching
– [HKP 97]
–
– [SP 98]
– Only tight for the special case when points form a grid
– [ASV 99]
A
nr
log
log
1
log
logd
A
Br
AA
Br
log
log
11
Redundancy Theorem [SP 98](Asymptotic version)
For N data objects, if there exist m queries q1, …, qm, such that for any 1 ≤ i,j ≤ m, i ≠ j,
|qi| ≥ B,
|qi∩qj| ≤ B/A2,
then, we have the redundancy
• Combinatorial in nature
• Used successfully to obtain the range searching lower bound
m
iiq
Nr
1
||1
12
Point Enclosure Lower Bound Construction (1)• Set of queries: the Fibonacci lattice (one of low-discrepancy point sets)
– m points in a m×m grid
– Only property used: any rectangle with area αm contains between and points
• Set of objects
– Tiling rectangles of αti×m/ti
– t=(m/α)1/B, i=1,…,B
– m=αN/B
– Θ(B·m2/(αm)) = Θ(N)rectangles are constructed
– |qi| ≥ B is satisfied
1/ c 2/ c
N
Bmq
Nr
m
ii
1
||1
13
Point Enclosure Lower Bound Construction (2)
• Any A that satisfies |qi∩qj| ≤ B/A2 will become a lower bound
• Make A as large as possible
• For a rectangle to cover q1 and q2, we must have αti≥x and m/ti≥y, or x/α≤ ti ≤ m/y
• q1 and q2 are two points from the Fibonacci lattice, so xy≥c2m
• # such rectangles ≤
)/log(
)/log(1
/
/log 2
BN
cB
x
ymt
14
Point Enclosure Lower Bound Construction (3)
2log
log1
A
B
n
rB
Br
nA ,
log
logmin
• Disprove earlier (n, logBn+T/B) conjecture
• Still a square root factor away
• What’s wrong? The construction technique, or the model itself?
15
Refine the Indexability Model
O(logBn + |q|/B)
Search cost Retrieval cost
• Observation: retrieval cost is relatively high for small queries
• Refine: add an addictive factor!
– Old: any query q is covered by blocks
– New: Any query q is covered by blocks
• Modify the Redundancy Theorem accordingly
– The two conditions:
BqA /||
BqAA /||10
|qi| ≥B,
|qi∩qj| ≤ B/A2
|qi| ≥BA0,
|qi∩qj| ≤ B/A12
16
The Refined Redundancy Theorem
For N data objects, if there exist m queries q1, …, qm, such that for any 1 ≤ i,j ≤ m, i ≠ j,
|qi| ≥ BA0,
|qi∩qj| ≤ B/(2A1)2,
then, we have the redundancy
Proof Sketch:
Each query can be covered by blocks,
and apply the original Redundancy Theorem with A=2A1
m
iiq
Nr
1
||1
BqABqAA /||2/|| 110
17
Fix the Construction• Old construction
– |q| = B
– B layers of tiling rectangles
– Size of Fibonacci latticem=αN/B
– Total # rectangles: N
• New construction
– |q| = BA0
– BA0 layers of tiling rectangles
– Size of Fibonacci latticem=αN/(BA0)
– Total # rectangles: N
2log
log1
A
B
n
rB 20
1log
log1
A
B
n
rBA
r
nA
log
log2
r
nAA
log
log210
18
Range Searching vs. Point Enclosure• Range searching
– Original model
– New model
• Point enclosure
• Dual bounds in external memory!
A
nr
log
log
)log(
log
10 AA
nr
r
nAA
log
log20 1
20 1AA
r
19
Matching Upper Bounds (1)• In the external pointer machine model
• Only interested in the case A1=O(1)
• Goal: for any r ≤ B, design an index with redundancy r that answers query in O(logrn+T/B) I/Os
• Building block: one-sided segment intersection queries
– Given N horizontal segments
– Report all segment directly above a query point
• Persistent B-tree (modified)
– O(n) space, O(logBn+T/B) query
– Search on the x-coordinate ofthe query point
– Retrieve the segments
20
Matching Upper Bounds (2)• Divide plane into r horizontal slabs• Associate two one-sided segment
intersection structures to each slab– One for all top sides of rectangles
that cross its bottom boundary– One for all bottom sides of
rectangles that cross its top boundary and all bottom sidesof rectangles that completely span the slab
• Recursively handle rectangles that fall completely within a slab, resulted in a tree with fanout r
• Any rectangle is stored at most r times: redundancy is r• Query: follow the tree top-down, ask two one-sided queries at each l
evel. O(logrn logBN+T/B) I/Os → O(logrn+T/B) by fractional cascading