21
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete

Optimal Planar Point Enclosure Indexing

Embed Size (px)

DESCRIPTION

Optimal Planar Point Enclosure Indexing. Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete. Two Dual Problems. Range searching. Point enclosure. √. √. Internal memory External memory. √. ?. Outline. - PowerPoint PPT Presentation

Citation preview

Optimal Planar Point Enclosure Indexing

Lars Arge, Vasilis Samoladas and Ke Yi

Department of Computer Science

Duke University

Technical University of Crete

2

Two Dual Problems

Range searching Point enclosure

√√ ?

Internal memory

External memory

3

Outline

1. Previous results in internal memory

2. Computation models in external memory

3. Previous results in external memory

4. Our lower bound result

5. Matching upper bound

6. Conclusions

4

Previous Results: Internal Memory• Computation model: Pointer machine

• Range searching (T is the output size)

– O(N) space, O(Nε+T) time ([BM 80])

– O(N logN / loglogN) space, O(logN+T) time [Chazelle 88]

* Tight for O(logcN+T) query structures, [Chazelle 90]

– Can do better on a RAM

– Other tradeoffs …

• Point enclosure [Chazelle 86]

– Ө(N) space, Ө(logN+T) time

– Optimal in both space and time

5

External Memory: Models• External pointer machine

– Natural generalization of the internal pointer machine

– Each node contains B data objects

– Out-degree 2 →B

• Bounding-volume hierarchy (Non-replicating index structure)

– Tree structure

– Each object is stored only once

• Indexability model [HKP 97]

D

P

M

Block I/O

6

External Memory: Models• Indexability model

– No “structure” at all!

– Only models layout of data

– Each block contains B data objects

– Can “magically” find the smallest set Π of blocks whose union contains all results

– Cost is defined to be |Π|

Bounding volume hierarchyR-trees, kd-trees

External pointer machineAll other known results

Indexability model

1D range searching

7

• Range searching (n=N/B)

– Similar to internal memory, tradeoff between space and time

– O(logBn+T/B) query time

* O(n log n / loglogBn) space [ASV 99]

– Tight in external pointer machine [SR 95]

– Improved to indexability model [ASV 99]

– O(n) space

* O( ) time [kdB-tree, GI 99, KS 99]

– Tight in bounding-volume hierarchies

* Can do O(nε+T/B) with constant redundancy

– Tight in indexability model [ASV 99]

Previous Results: External Memory

BTn /

8

Previous Results: External Memory• Point enclosure

– Ω( ) for bounding-volume hierarchies [ABGHH 01]

– Easy to get a O(n) space, O(log2n+T/B) query structure

BTn /

Problem Internal memory External memory

1D range (N, log N + T) (n, logBn + T/B)

1D point enclosure

(N, log N + T) (n, logBn + T/B)

2D range

(N, Nε+T) (n, nε+T/B)

2D point enclosure

(N, log N + T)

TN

N

NN log,

loglog

log

TN

N

Nn B

B

log,loglog

log

B2(n, log n + T/B)?

(n, log n+T/B)

(nBε, logBn+T/B)

9

Indexability Model in Details• N data objects laid out in disk blocks, possibly with redundancy

• Each block holds at most B objects

• Cost of a query q: minimum # blocks needed to retrieve all answers

– Can find those blocks without cost

• Redundancy r and access overhead A

– r: Average # copies in the index

* Size is rn blocks

– A: Ratio of the query cost to the ideal cost in the worst case

* Any query can be covered by blocks (A ≤ B)

• Lower bound expressed as a tradeoff between r and A

– 2D range searching [ASV 99]

Bq /||

BqA /||

)log/(log Anr

10

Previous Results in Indexability Model• Set queries [HKP 97]

– A set S of N objects, queries can be any subset of S

– For any r≤n/B, A=B

– Trivial

• Range searching

– [HKP 97]

– [SP 98]

– Only tight for the special case when points form a grid

– [ASV 99]

A

nr

log

log

1

log

logd

A

Br

AA

Br

log

log

11

Redundancy Theorem [SP 98](Asymptotic version)

For N data objects, if there exist m queries q1, …, qm, such that for any 1 ≤ i,j ≤ m, i ≠ j,

|qi| ≥ B,

|qi∩qj| ≤ B/A2,

then, we have the redundancy

• Combinatorial in nature

• Used successfully to obtain the range searching lower bound

m

iiq

Nr

1

||1

12

Point Enclosure Lower Bound Construction (1)• Set of queries: the Fibonacci lattice (one of low-discrepancy point sets)

– m points in a m×m grid

– Only property used: any rectangle with area αm contains between and points

• Set of objects

– Tiling rectangles of αti×m/ti

– t=(m/α)1/B, i=1,…,B

– m=αN/B

– Θ(B·m2/(αm)) = Θ(N)rectangles are constructed

– |qi| ≥ B is satisfied

1/ c 2/ c

N

Bmq

Nr

m

ii

1

||1

13

Point Enclosure Lower Bound Construction (2)

• Any A that satisfies |qi∩qj| ≤ B/A2 will become a lower bound

• Make A as large as possible

• For a rectangle to cover q1 and q2, we must have αti≥x and m/ti≥y, or x/α≤ ti ≤ m/y

• q1 and q2 are two points from the Fibonacci lattice, so xy≥c2m

• # such rectangles ≤

)/log(

)/log(1

/

/log 2

BN

cB

x

ymt

14

Point Enclosure Lower Bound Construction (3)

2log

log1

A

B

n

rB

Br

nA ,

log

logmin

• Disprove earlier (n, logBn+T/B) conjecture

• Still a square root factor away

• What’s wrong? The construction technique, or the model itself?

15

Refine the Indexability Model

O(logBn + |q|/B)

Search cost Retrieval cost

• Observation: retrieval cost is relatively high for small queries

• Refine: add an addictive factor!

– Old: any query q is covered by blocks

– New: Any query q is covered by blocks

• Modify the Redundancy Theorem accordingly

– The two conditions:

BqA /||

BqAA /||10

|qi| ≥B,

|qi∩qj| ≤ B/A2

|qi| ≥BA0,

|qi∩qj| ≤ B/A12

16

The Refined Redundancy Theorem

For N data objects, if there exist m queries q1, …, qm, such that for any 1 ≤ i,j ≤ m, i ≠ j,

|qi| ≥ BA0,

|qi∩qj| ≤ B/(2A1)2,

then, we have the redundancy

Proof Sketch:

Each query can be covered by blocks,

and apply the original Redundancy Theorem with A=2A1

m

iiq

Nr

1

||1

BqABqAA /||2/|| 110

17

Fix the Construction• Old construction

– |q| = B

– B layers of tiling rectangles

– Size of Fibonacci latticem=αN/B

– Total # rectangles: N

• New construction

– |q| = BA0

– BA0 layers of tiling rectangles

– Size of Fibonacci latticem=αN/(BA0)

– Total # rectangles: N

2log

log1

A

B

n

rB 20

1log

log1

A

B

n

rBA

r

nA

log

log2

r

nAA

log

log210

18

Range Searching vs. Point Enclosure• Range searching

– Original model

– New model

• Point enclosure

• Dual bounds in external memory!

A

nr

log

log

)log(

log

10 AA

nr

r

nAA

log

log20 1

20 1AA

r

19

Matching Upper Bounds (1)• In the external pointer machine model

• Only interested in the case A1=O(1)

• Goal: for any r ≤ B, design an index with redundancy r that answers query in O(logrn+T/B) I/Os

• Building block: one-sided segment intersection queries

– Given N horizontal segments

– Report all segment directly above a query point

• Persistent B-tree (modified)

– O(n) space, O(logBn+T/B) query

– Search on the x-coordinate ofthe query point

– Retrieve the segments

20

Matching Upper Bounds (2)• Divide plane into r horizontal slabs• Associate two one-sided segment

intersection structures to each slab– One for all top sides of rectangles

that cross its bottom boundary– One for all bottom sides of

rectangles that cross its top boundary and all bottom sidesof rectangles that completely span the slab

• Recursively handle rectangles that fall completely within a slab, resulted in a tree with fanout r

• Any rectangle is stored at most r times: redundancy is r• Query: follow the tree top-down, ask two one-sided queries at each l

evel. O(logrn logBN+T/B) I/Os → O(logrn+T/B) by fractional cascading

21

Conclusions

1. A tight lower bound on the tradeoff between the redundancy and access overhead of any index for the 2D point enclosure queries, given in the new indexability model

2. A matching upper bound in the external pointer machine

The END