19
Indexing Spatio-Temporal Data Indexing Spatio-Temporal Data Warehouses Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Zhang Department of Computer Science Department of Computer Science Hong Kong University of Science and Technology Hong Kong University of Science and Technology Clear Water Bay, Hong Kong Clear Water Bay, Hong Kong 26, Feb, 2002 26, Feb, 2002 This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC.

Indexing Spatio-Temporal Data Warehouses

  • Upload
    nicki

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC. Indexing Spatio-Temporal Data Warehouses. Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong - PowerPoint PPT Presentation

Citation preview

Page 1: Indexing Spatio-Temporal Data Warehouses

Indexing Spatio-Temporal Data WarehousesIndexing Spatio-Temporal Data Warehouses

Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun ZhangDimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang

Department of Computer ScienceDepartment of Computer Science

Hong Kong University of Science and TechnologyHong Kong University of Science and Technology

Clear Water Bay, Hong KongClear Water Bay, Hong Kong

26, Feb, 200226, Feb, 2002

This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC.

Page 2: Indexing Spatio-Temporal Data Warehouses

2

OutlineOutline

• Preliminary – Spatial data warehouses and aggregate trees

• Applications and motivation

• Solution for static objects

• Solution for dynamic objects

• Performance study

• Conclusion

Page 3: Indexing Spatio-Temporal Data Warehouses

3

Preliminary – Spatial Data WarehousesPreliminary – Spatial Data Warehouses

• Each spatial object carries some sort of aggregate information (i.e., each landscape may involve the population).

• A common query is the window aggregate query, which specifies a window query and retrieves the aggregate sum of all objects intersecting it. – Analogy of the “group-by” in conventional

data warehouses.

• Materialization techniques common in traditional data warehouses are of limited use since possible positions of queries are infinite.– Ad-hoc “group-by”

R1

R2

R3

R4

15075

132

12

qs

Page 4: Indexing Spatio-Temporal Data Warehouses

4

Preliminaries – Spatial Data WarehousePreliminaries – Spatial Data Warehouse

• A better approach is to deploy aggregate trees to introduce the spatial hierarchy [Kline and Snodgrass, 1995, Papadias, et al, 2001, Lazaridis and Mehrotra, 2001].

R1

R2

R3

R4

R5

R6

15075

132

12

R1

150R2

75R3

132R4

12

R5

225R6

144

qs

Aggregation R-tree

Retrieve the sum of aggregate of objects intersecting qs

Page 5: Indexing Spatio-Temporal Data Warehouses

5

Spatio-Temporal DW: Applications and Spatio-Temporal DW: Applications and MotivationMotivation

• Spatio-temporal databases deal with objects whose properties may change with time.

• Traditional studies in spatio-temporal databases focus on retrieving the actual objects that satisfy the query predicates.

– Retrieve all vehicles that appear in the north district during 3pm to 5pm yesterday.

• A more useful type of queries may be to retrieve, instead of the actual object IDs, the number of objects that satisfy the query conditions.

– Retrieve the (approximate) number of vehicles in the north district during 3pm-5pm yesterday.

• In the above example, the spatial objects (i.e., streets in the north district) that carry aggregate information (i.e., number of cars) are static. Other queries may involve dynamic objects.

– The mobile phone antenna (i.e., the aggregate information = # of users served by the antenna) whose spatial extents (i.e., covering areas) may change over time.

Page 6: Indexing Spatio-Temporal Data Warehouses

6

Example (Static Objects)Example (Static Objects)

regions

T1 T2 T3 T5

R1

R2

R3

4R

150

75

132

12

150

80

127

12

145

85

125

12 12

127

90

130135

90

127

12

now

aggregate results over timestamps

369 369 367 364

T4

359

60

638

420

710

1828

aggregate results

over regions FACT TABLE

total sum

R 1 R 3

R 4

R 2

qs

Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.

Page 7: Indexing Spatio-Temporal Data Warehouses

7

Traditional MethodsTraditional Methods

• Pre-materialization

– Even more difficult than spatial DW due to the inclusion of the temporal dimension.

• Use an aggregation tree.

– When the aggregate of a region changes, create a 3D box. An aggregate 3D R-tree is used to index all these boxes.

– Problem: The spatial extent of a region must be duplicated many times although it does not change.

3D boxes for region R1

150

145

135

130

T3

T4

T5

T1

Page 8: Indexing Spatio-Temporal Data Warehouses

8

Aggregate RB-treeAggregate RB-tree

T1 T2 T3 T5

R1

R2

R3

4R

150

75

132

12

150

80

127

12

145

85

125

12 12

127

90

130135

90

127

12

now

aggregate results over timestamps

369 369 367 364

T4

359

60

638

420

710

1828

aggregate results

over regions FACT TABLE

total sum R-tree for spatial dimension

1 150 3 145 4 135 5 130

1 445 4 265

1 75 2 80 3 85 4 90

1 155 3 265

1 132 2 127 3 125 4 127

1 259 3 379

1 12

B-tree for R1

B-tree for R2 B-tree for R3

B-tree for R4

1 225 2 230 4 225 5 220

1 685 4 445

B-tree for R5

1 144 2 139 3 137 4 139

1 283 3 405

B-tree for R61 369 3 367 4 364 5

1 3 723

B-tree for the whole space

359

1105

R1 710 R2 420 R3 638 R4 60

R5

1130 R6 698

R 5

R 1 R 3

R 4

R 6

R 2

qs

Spatial extents are stored only once.

Page 9: Indexing Spatio-Temporal Data Warehouses

9

Example (Dynamic Objects)Example (Dynamic Objects)

regions

T1 T2 T3 T5

R1

R2

R3

4R

150

75

132

12

150

80

127

12

145

85

125

12 12

127

90

130135

90

127

12

now

aggregate results over timestamps

369 369 367 364

T4

359

60

638

420

710

1828

aggregate results

over regions FACT TABLE

total sum

Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.

R1

R2

R3

R4

qs

Situation during timestamps 1-4

Page 10: Indexing Spatio-Temporal Data Warehouses

10

Example (cont.)Example (cont.)

regions

T1 T2 T3 T5

R1

R2

R3

4R

150

75

132

12

150

80

127

12

145

85

125

12 12

127

90

130135

90

127

12

now

aggregate results over timestamps

369 369 367 364

T4

359

60

638

420

710

1828

aggregate results

over regions FACT TABLE

total sum

Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.

R1

R2

R3

R4

qs

change position at timestamp 5

Page 11: Indexing Spatio-Temporal Data Warehouses

11

R 5

R 1 R 3

R 4

R 6

R 2

qs

R'5

R'1R 3

R 4

R 6

R 2

qs

Aggregate HRB-treeAggregate HRB-tree• Integrates the previous idea with the spatio-temporal access method HR-trees.

R-tree for spatial dimension

B-tree for R3

R1 R2 R3 R4

R5 R6

timestamps 1-4

R'1 R2

R'5 R6

R-tree for spatial dimension

timestamp 5

B-tree for R4

B-tree for R5

B-tree for R'1B-tree for R2

A

B C

D

E

B-tree for R1 B-tree for R'5

B-tree for R6

timestamp 5timestamp 1-4

Page 12: Indexing Spatio-Temporal Data Warehouses

12

Aggregate 3D RB-treeAggregate 3D RB-tree

• Creates a 3D box only when the spatial extent of an object changes.

time5

R5

R1

R2

R3

R4

R6

R'1

R'5

B-tree for R3B-tree for R1 B-tree for R4

B-tree for R5 B-tree for R6

B-tree for R2

R'1B-tree for R'5B-tree for

Page 13: Indexing Spatio-Temporal Data Warehouses

13

Managing Numerous B-treesManaging Numerous B-trees

• If each B-tree is too small (i.e., the rates of spatial extent and aggregate changes are similar)

– A block contains too few entries and much space is wasted.

– Not suitable for caching.

• Our solution is to use a B-File, which “packs” numerous B-trees into a single file

– Avoiding empty spaces in a disk page.

– Maintaining the same query performance.

Page 14: Indexing Spatio-Temporal Data Warehouses

14

PerformancePerformance• Dataset settings

– Number of spatial objects = 10,000

– History length = 1,000 timestamps

– Aggregate agility – describes how fast the aggregate information changes (4%, 8%, 16%, 32%, 64%)

– Region agility – describes how fast the spatial extents change

• 0% for static objects

• 0.01% for dynamic objects (capturing the fact that spatial dimension changes much slower than the aggregate data)

– Datasets include 500,000 to 6,500,5000 records.

• Each query contains 2 parameters: (spatial extents and interval length).

Page 15: Indexing Spatio-Temporal Data Warehouses

15

Results (Static Objects)Results (Static Objects)

0

50

100

150

200

250

300

350

400

4 8 16 32 64

aRB a3DR

aggregate agility (%)

Mega bytes

0

30

60

90

120

150

180

4 8 16 32 64

aRB

a3DR

aggregate agility (%)

node accesses

Page 16: Indexing Spatio-Temporal Data Warehouses

16

Results (Static Objects)Results (Static Objects)

0

30

60

90

120

150

180

1 25 50 75 100

aRB

a3DR

query length

node accesses

0

30

60

90

120

150

180

1 3 5 7 9

aRB

a3DR

query extent (%)

node accesses

Page 17: Indexing Spatio-Temporal Data Warehouses

17

Results (Dynamic Objects)Results (Dynamic Objects)

0

50

100

150

200

250

300

350

400

4 8 16 32 64

a3DR a3DRB

aHRB

aggregate agility (%)

Mega bytes

0

30

60

90

120

150

180

4 8 16 32 64

a3DR a3DRB

aHRB

aggregate agility (%)

node accesses

Page 18: Indexing Spatio-Temporal Data Warehouses

18

Results (Dynamic Objects)Results (Dynamic Objects)

0

50

100

150

200

1 25 50 75 100

a3DR a3DRB

aHRB

node accesses

query length

0

50

100

150

200

1 3 5 7 9

a3DR a3DRB

aHRB

query extent (%)

node accesses

Page 19: Indexing Spatio-Temporal Data Warehouses

19

ConclusionConclusion

• We propose indexing techniques that replace the data cube in spatio-temporal data warehouses and answer ad-hoc group-by queries very efficiently.

– Both static and dynamic spatial dimensions are discussed.

• Extensions

– Cost models that predict the performance of alternative structures.

– Query optimization based on the cost models.

– Complex query evaluation