37
CSIS7101 – Advanced Database Technologies Spatio-Temporal Data (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing Arthur

CSIS7101 – Advanced Database Technologies

  • Upload
    lis

  • View
    31

  • Download
    2

Embed Size (px)

DESCRIPTION

CSIS7101 – Advanced Database Technologies. Spatio -T emporal D ata (Part 3) The MV3-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries. Kwong Chi Ho Leo Wong Chi Kwong Simon Lui, Tak Sing Arthur. Introduction. Spatial-Temporal Database Management Systems (STDBMS) - PowerPoint PPT Presentation

Citation preview

Page 1: CSIS7101 – Advanced Database Technologies

CSIS7101 – Advanced Database Technologies

Spatio-Temporal Data (Part 3)

The MV3-Tree:A Spatio-Temporal Access Method for

Timestamp and Interval Queries

Kwong Chi Ho LeoWong Chi Kwong SimonLui, Tak Sing Arthur

Page 2: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 2

Introduction

Spatial-Temporal Database Management Systems (STDBMS) Mobile Phone Systmes

Track users efficiently Provide better communication services

Traffic Supervision Systems Monitor vehicle locations Motion patterns

Urban Planning Record the development of landscapes over the years Retrieve urban situations at any given time in the past

Page 3: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 3

Spatio-Temporal Queries

Traditional STDBMS Focus on static objects

Attempting to update the database whenever the objects change their positions which will cause the STDBMS to spend most of the time just handling the updates

It would result in huge space requirement To deal with objects that have dynamic behavior, new

querying languages, modeling methods, novel attribute representation and specialized access methods are needed

STR-trees and TB-trees Focus on efficient trajectory retrieval

TPR-trees Focus on predicting objects’ future locations by storing their current

positions and velocities

Page 4: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 4

Spatio-Temporal Queries (Continue)

To deal with historical information retrieval, windows queries about objects that move in discrete time is commonly used.

Timeslice or Timestamp Queries Retrieve all objects that intersect a windows at a specific

timestamp Interval Queries

Include several consecutive timestamps

Page 5: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 5

Historical Information Retrieval

Types of Indexing MR-trees and HR-trees

Maintain a separate R-tree for each timestamp, but allow consecutive trees to share branches

Advantages Efficient for timestamp queries, as search degenerates into a

static spatial window query for which R-trees are very efficient

Disadvantages Extensive duplication of objects (even if they do not move)

which lead to huge space requirements for most typical application

Poor performance on interval queries

Page 6: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 6

Historical Information Retrieval (Continue)

3-Dimensional R-trees (the 3rd dimension corresponding to time)

An object which does not change its position during a certain period of time is modeled as a 3D box, bounding both its spatial and temporal attributes

A moving object is modeled by multiple boxes, each corresponding to a different version.

Advantages The temporal attribute is integrated tightly with the spatial

attributes thus interval queries can be answered efficiently Redundant duplication is avoided thus space usage can be

reduced.

Page 7: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 7

Historical Information Retrieval (Continue)

Disadvantages Poor performance on timestamp queries as the query time

depends on the total number of entries in history rather than the live entries at the query timestamp.

time

x

y

time

x

y

Object states at the same location from

t1 to t2

At location A

At location B

Object moves from location A to B at time t2t1

t2

t1

t2

t3

Page 8: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 8

Induction of MV3R-tree

MV3R-tree utilizes the concepts of multi-version R-tree (MVR-tree) and a small auxiliary 3D R-tree The auxiliary 3D R-tree builts on the leave of the

MVR-tree Aims at timestamp and interval window queries For retrieving the past locations of discretely

moving objects Enhancing the performance of multi-version

framework for multi-dimensional access methods

Page 9: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 9

Induction of MV3R-tree (Continue)

MVR-tree involves several heuristics that take into

account the features of R-trees to improve performance significantly

The auxiliary 3D R-trees outperform traditional 3D R-trees for most

queries MV3R-tree space consumption

up to an order of magnitude smaller than that of an HR-tree, while maintaining comparable timestamp query performance

Page 10: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 10

Overview of MVB-trees, HR-trees & 3D R-trees

Multi-version B-trees Extensions of B-trees Index the evolution of one-dimensional data in

transaction time temporal databases Insertions and deletions can only happen at current

time Entry in the form of <key, tstart, tend, pointer>

For root and intermediate entries, pointers points to a next level node

For leaf entries, pointers points to the actual record with the corresponding key value.

An object is said to be alive at time t if : tstart ≤ t < tend

Page 11: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 11

Overview of MVB-trees (Continue)

Temporal attributes tstart and tend denotes the time that the record was inserted and deleted in the database respectively

Deletions are logical, i.e. actual records are not physically removed from database

For currently live entries, tend would be denotes as * where * means “NOWTIME”

Can have multiple roots and each root has a jurisdiction interval

For each timestamp t, each node, except the roots, is required that either none or at least b.Pversion entries are alive at t

Pversion = tree parameter b = node capacity

Page 12: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 12

Overview of MVB-trees (Continue)

Examples: Pversion = 1/3, b = 6, minimum entries = 1/3 * 6 = 2

< 5, 1, *, A><43, 1, *, B><72, 1, *, C>

Root

< 5, 1, *>< 8, 1, *><13, 1, *><25, 1, 3><27, 1, 3><29, 1, 3>

A<43, 1, *><48, 1, *><52, 1, 2><59, 1, 3><68, 1, 3>

B< 72, 1, *>< 78, 1, *>< 83, 1, *>< 95, 1, 3>< 99, 1, *><102, 1, *>

C

Point to leaf node A.Created at time 1

and is alive

Point to leaf node B.Created at time 1

and is alive

Point to leaf node C.Created at time 1

and is alive

Key 43 in leaf node B is created at time

1 is still alive Key 95 in leaf node C is created at time 1and

is deleted at time 3

Page 13: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 13

Overview of MVB-trees (Continue)

Insertions

< 5, 1, *, A><43, 1, *, B><72, 1, *, C>

Root

< 5, 1, *>< 8, 1, *><13, 1, *><25, 1, 3><27, 1, 3><29, 1, 3>

A

<43, 1, *><48, 1, *><52, 1, 2><59, 1, 3><68, 1, 3>

B

< 72, 1, *>< 78, 1, *>< 83, 1, *>< 95, 1, 3>< 99, 1, *><102, 1, *>

C

< 5, 4, *>< 8, 4, *><13, 4, *><28, 4, *>

D

< 5, 1, 4>< 8, 1, 4><13, 1, 4><25, 1, 3><27, 1, 3><29, 1, 3>

A

<43, 1, *><48, 1, *><52, 1, 2><59, 1, 3><68, 1, 3>

< 72, 1, *>< 78, 1, *>< 83, 1, *>< 95, 1, 3>< 99, 1, *><102, 1, *>

B C

<43, 1, *, B><72, 1, *, C>

Root< 5, 1, 4, A>

< 5, 4, *, D>

Insertion of <28, 4, *> at timestamp 4 cause

node A overflow

A new node D is created to store live

entries of A

A “dies” meaning that it will not be modified

in the future

Page 14: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 14

Overview of MVB-trees (Continue)

Deletion

< 5, 1, *, A><43, 1, *, B><72, 1, *, C>

Root

< 5, 1, *>< 8, 1, *><13, 1, *><25, 1, 3><27, 1, 3><29, 1, 3>

A

<43, 1, *><48, 1, *><52, 1, 2><59, 1, 3><68, 1, 3>

B

< 72, 1, *>< 78, 1, *>< 83, 1, *>< 95, 1, 3>< 99, 1, *><102, 1, *>

C

< 5, 1, *>< 8, 1, *><13, 1, *><25, 1, 3><27, 1, 3><29, 1, 3>

A

<43, 1, 4><48, 1, 4><52, 1, 2><59, 1, 3><68, 1, 3>

B

< 72, 1, 4>< 78, 1, 4>< 83, 1, 4>< 95, 1, 3>< 99, 1, 4><102, 1, 4>

C

< 5, 1, *, A>

Root

<43, 4, *><72, 4, *><78, 4, *>

D

< 83, 4, *>< 99, 4, *><102, 4, *>

E

<43, 4, *, D><83, 4, *, E>

Deletion of <48, 1, *> at timestamp 4 causes node B

underflow. Key split is performed.

A sibling is chosen, say node C. Live entries are copied to

new nodes (D & E).

Nodes B and C died.Node D and E created.

<43, 1, 4, B><72, 1, 4, C>

Page 15: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 15

Overview of MVB-trees (Continue)

Summary Insertions and deletions may cause overflow and

underflow respectively, thus create version splits. Version splits create data redundancy for those

entries duplicated. Such redundancy harms interval query performance as both the original and duplicated versions may need to be retrieved.

Underflow – Strong and Weak Versions Strong version underflow happens after a version split. Weak version underflow occurs when the weak version

condition is violated.

Page 16: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 16

Overview of MVB-trees (Continue)

MVB-trees require O(N/b) space, where N is the number of updates ever made to the database and b is the block capacity.

Answering a timestamp range query requires O(logbM + r/b) I/O’s, where M is the number of live object at the queried timestamp, and r is the number of output objects.

Page 17: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 17

Overview of MVB-trees, HR-trees & 3D R-trees

Historical R-trees Based on the overlapping technique, another

framework for transforming a single version data structure into a transaction time access method

The structure maintains an R-tree for each timestamp, but common branches of consecutive trees are stored only once in order to save space

A timestamp query is directed to the corresponding R-tree and search is performed inside the tree body. Thus, the query degenerates into an ordinary window query and is handled very efficiently

Page 18: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 18

Overview of Historical R-trees (Continue)

An interval query should search the corresponding trees of all the timestamps involved.

Example: object e changes position at timestamp 1

a0 b0 a0 c0 d0 e0

R0

A0 B0 C0

D0 E0

Timstamp 0

a0 b0 e1 c0 d0 e0

R1

B1 C1

D1 E1

Timstamp 1

Object e deleted from node E at timestamp 1

Object e added to node D at timestamp 1

e1 e0

As no object in node A changes from timestamp 0 to 1, it can be

shared by other trees.

Page 19: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 19

Overview of MVB-trees, HR-trees & 3D R-trees

3D R-trees They view time as just another dimension and

integrate it in the tree construction The movements of 2D objects can be modeled as

distinct boxes in three dimensional space Temporal projection denotes the period when the

corresponding object remains static Spatial projections of the box correspond to the

object’s position and extents during the period

Page 20: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 20

Overview of 3D R-trees (Continue)

No mechanism to ensure that each node has a minimum number of live entries at a given short-interval queries

Poor in timestamp and short-interval query performance

A single tree for the whole history A node may have a lot of dead space at a timestamp t,

meaning that there is a high chance that the query window intersects the bounding box but no object inside it

Where there are many objects with long lifespans, the problem becomes more serious because these objects will force the node that contain them to have long lifespans as well

It depends on the total number of records, rather than on the number of records alive at the queried timestamps

Page 21: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 21

Overview of 3D R-trees (Continue)

Good performance in long interval queries No redundancy R-trees optimize queries with similar extents along all

dimensions

time

xy

Different objects at different locations at different time interval

Only one live entry may be retrieved at a timestamp t.

Ttimestamp t

A high chance that the query window

intersects the bounding box but no

object inside

Page 22: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 22

MV3R-Trees

Why needs MV3R-Trees Currently, there are no such structure that can

effectively handle both timestamp and interval queries.

Why is MV3R-Trees good Reduce the structure size but improve query

performance Are applicable to other multi-dimensional access

methods when they are converted to corresponding multi-version structures

Page 23: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 23

MV3R-Trees (Continue)

How MV3R-Trees work Contain a multi-version R-Tree (MVR-tree) and a

small auxiliary 3D R-tree built on the leaf nodes of the MVR-tree

MVR-tree can contain multiple R-trees, which refer to as “logical trees”. Each entry has the form as with MVB-trees: <S, tstart, tend, pointer>. S denotes the spatial minimum bounding rectangle (MBR) as defined in R-trees.

MVR-tree inherit the concept of weak version condition.

Page 24: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 24

MV3R-Trees (Insertion and overflow)

Intermediate nodes insertion Allow redundancy in order to maintain good performance

for timestamp and short-interval queries Process of insertion for intermediate nodes:

InsertionBlock

OverflowVersion

Split

Strong Version

Overflow

Key Split

Version Split at timestamp 10

<C6, 10, *>

The lifespan of A1 does not include timestamp 10, its MBR does not cover C6. The MBR of A1 may be tightedned

MBR of B1 is small than A1 because only C5 and C6 is bounded by B1 at timestamp 10

. . .

<A1, 1, 10, C>

. . .

A

. . .

<A1, 1, *, C>

. . .

A

. . .

<B1, 10, *, C>

. . .

B

<C1, 1, 3>

<C2, 1, 3>

<C3, 2, 8>

<C4, 2, 8>

<C5, 5, *>

C Insertion of object C6 at timestamp 10

Page 25: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 25

MV3R-Trees (Insertion and overflow)

Do not consider at all strong version underflows because Underflow in MVR-trees happens much less frequently than

overflows Handle underflows by entry re-insertion, which may trigger

block overflows in several other nodes Version splits need to take into account the spatial extents

of the nodes.

Page 26: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 26

MV3R-Trees (Insertion and overflow)

Leaf nodes insertion Try to avert version splits thus reduce redundancy to

reduce storage space but maintain timestamp query performance

Small number of leaf nodes will facilitate interval query processing using the auxiliary 3D R-tree

Process of insertion for leaf nodes: To avoid version splits, try the following alternatives in

order:1. General Key Split

2. Insert in node after reinserting one of its entries

3. Insert in another node

4. Version split

Page 27: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 27

MV3R-Trees (Insertion and overflow)

1. General Key Split If a new entry is to be inserted, the entries can be distributed to two nodes so that for

each timestamp in a time range, there exist at least b.Pversion entries alive. Thus version split, which generate version redundancy for entries, can be avoided. However, it may be difficult or impossible.

Two new nodes should have small overlap. General key split is different from ordinary key split because ordinary key split is

applied when all the entries in the node to be split are alive and their tstart equal current time.

2. Reinsert an Existing Entry of the Node Any leaf node can store a re-inserted entry provided that:

Its lifespan must cover that of the entry It should be dead if the entry is dead Its area should not be enlarged much, in order to ensure good performance for

timestamp queries

InsertionBlock

Overflow

1. General Key Split2. Insert in node after

reinserting one of its entries3. Insert in another node4. Version Split

Strong Version

Overflow

Key Split

Page 28: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 28

MV3R-Trees (Insertion and overflow) Dead entries always come before live ones as live entries will be reinserted only into

live nodes, which may also overflow and induce the same problem in the future Among the dead and live entries, sorting is based on the area decrease of the nod

MBR caused by the entry deletion Reinsert a single entry even if it is possible to reinsert more because

Reinsertion saves space but does not achieve structure improvement Reinsertion of a single entry already achieves the objective of averting the

version split

3. Insert in Another Node Tries to insert the new entry into another node that is not full Backtrack to the upper level and try to insert the entry into another branch Only consider branches that will incur small area enlargement The area enlargement of candidate branches can only exceed that of the best branch

by a certain percentage

Page 29: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 29

MV3R-Trees (Insertion and overflow)

Conclusion General Key Split

Does not require reading any more pages. It is the most efficient method in terms of update cost

It reduces the number of entries in the new nodes so that they will not overflow again in the near future

Reinsert an Existing Entry of the Node It can search multiple branches Update costs are compensated by the space savings

Insert in Another Node Requires backtracking up to level 2 Update costs are compensated by the space savings

Page 30: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 30

MV3R-Trees (Deletion and Underflow)

It is handled in a way similar to R*-trees if a deletion does not incur structural changes

An entry is physically deleted, only if its tstart is equal to the current time (multiple updates may happen at the same timestamp)

Intermediate and leaf nodes deletions are handled in the similar way as of insertion.

Intermediate node deletion Suppose an underflow occurs at the current timestamp t,

the live entries of tend to be set to t. Then these entries are re-inserted into the most recent logical R-tree after setting tstart = t.

Apply the R*-tree algorithms

Page 31: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 31

MV3R-Trees (Deletion and Underflow)

Leaf node deletion To avoid redundancy caused by reinsertion, a live entry

from a sibling node will be borrowed The borrowed (moved) entry should have the properties:

It must be alive and its lifespan must be covered by the original sibling node, say node A

After the removal of the entry, the version condition of the borrowed sibling node, say node B, must still be satisfied

Inserting this entry to the original sibling node, node A, will not cause its MBR to increase above a threshold

Page 32: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 32

<A1, 1, 2><A2, 1, *><A3, 1, *>

<B1, 1, 2><B2, 1, *><B3, 1, *>

<B5, 2, *><B6, 2, *>

A B

MV3R-Trees (Deletion and Underflow)

Example:

. . .<S1, 1, *, A><S2, 1, *, B>

. . .

S (root)<A1, 1, *><A2, 1, *><A3, 1, *>

<B1, 1, *><B2, 1, *><B3, 1, *>

A BTimestamp 1

<A1, 1, 2><A2, 1, *><A3, 1, *>

<B1, 1, 2><B2, 1, *><B3, 1, *><B4, 2, *><B5, 2, *><B6, 2, *>

A BTimestamp 2

<A4, 2, *> <B4, 2, *>

Object B4 has been borrowed

by A

Deletion of object A1 at timestamp 2 which cause underflow of

node A

Insertion of objects B4,

B5 and B6 at timestamp 2

Objects B2 and B3 cannot be moved because their deletion

from B will cause weak version underflow for timestamp 1

Page 33: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 33

MV3R-Trees (Auxiliary 3D R-tree)

Built on the leaves of the MVR-tree in order to process interval queries

For a given moderate node capacity, the number of leaf nodes in an MVR-tree is much lower than the actual number of objects.

Smaller in size as compare to a complete 3D R-tree Adding auxiliary 3D R-tree not only improves

interval query performance, but may also provides flexibility in other scenarios

Construction of auxiliary 3D R-tree Whenever a leaf node of the MVR-tree is updated, the

change is propagated to its entry in the 3D R-tree

Page 34: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 34

Query Processing with MV3R-trees

MVR-tree Timestamp query involves retrieval of the root whose jurisdiction

interval covers the queried timestamp, and then search is performed similarly to R-trees

3D R-tree For interval queries, multiple trees may need to be searched

Should avoid duplicate visits to the same node via different parents, otherwise, result in severe IO cost

Duplicate pointers to a node are created in version splits or entry reinsertions

In both cases, the two entries pointing to the same node have disjoint lifespans

For short interval, it will be used whenever the temporal query length exceeds a certain threshold

Its performance deteriorates gradually as the tree grows

Page 35: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 35

Query Processing with MV3R-trees

Example

. . .

<A1, 1, 10, C>

. . .

A. . .

<B1, 10, 20, C>

. . .

B

<C1, 1, 8, D>

<C2, 1, 12, E>

<C3, 10, 20, F>

<C4, 10, 20, G>

C

time

x

yCube of A1

Cube of B1

Cube of C3

Cube of C1

Cube of C2

Cube of C4

Query Cube

1. A1 and B1 are temporally adjacent

2. A1 spatially covers C1 and C2

3. B1 spatially covers C2, C3 and C4

4. C2 and C3 intersect the query box so their subtrees (node E and F) should be search

5. Since node C may be reached twice (by following A1 and B1), we may attemp redundant visits to E and F

Page 36: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 36

Conclusion of MV3R-trees

A structure that combines the concepts of MVB-trees and 3D R-trees

MV3R-trees can handle timestamp and interval queries efficiently with relatively small space requirements

MV3R-trees could be further improved by: Analytical cost models for determining the optimal tree to

answer short interval queries Overflow and underflow handling heuristics that are more

efficient in terms of update cost, and can avert more version splits

Page 37: CSIS7101 – Advanced Database Technologies

CSIS7101 - Advanced Database Technologies

Spatio-Temporal Data (Part 3) 37

References

Tao, Y., Papadias, D. The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries, 2000