Upload
nicki
View
22
Download
0
Embed Size (px)
DESCRIPTION
This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC. Indexing Spatio-Temporal Data Warehouses. Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong - PowerPoint PPT Presentation
Citation preview
Indexing Spatio-Temporal Data WarehousesIndexing Spatio-Temporal Data Warehouses
Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun ZhangDimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang
Department of Computer ScienceDepartment of Computer Science
Hong Kong University of Science and TechnologyHong Kong University of Science and Technology
Clear Water Bay, Hong KongClear Water Bay, Hong Kong
26, Feb, 200226, Feb, 2002
This work was supported by grants HKUST 6081/01E and 6070/00E from Hong Kong RGC.
2
OutlineOutline
• Preliminary – Spatial data warehouses and aggregate trees
• Applications and motivation
• Solution for static objects
• Solution for dynamic objects
• Performance study
• Conclusion
3
Preliminary – Spatial Data WarehousesPreliminary – Spatial Data Warehouses
• Each spatial object carries some sort of aggregate information (i.e., each landscape may involve the population).
• A common query is the window aggregate query, which specifies a window query and retrieves the aggregate sum of all objects intersecting it. – Analogy of the “group-by” in conventional
data warehouses.
• Materialization techniques common in traditional data warehouses are of limited use since possible positions of queries are infinite.– Ad-hoc “group-by”
R1
R2
R3
R4
15075
132
12
qs
4
Preliminaries – Spatial Data WarehousePreliminaries – Spatial Data Warehouse
• A better approach is to deploy aggregate trees to introduce the spatial hierarchy [Kline and Snodgrass, 1995, Papadias, et al, 2001, Lazaridis and Mehrotra, 2001].
R1
R2
R3
R4
R5
R6
15075
132
12
R1
150R2
75R3
132R4
12
R5
225R6
144
qs
Aggregation R-tree
Retrieve the sum of aggregate of objects intersecting qs
5
Spatio-Temporal DW: Applications and Spatio-Temporal DW: Applications and MotivationMotivation
• Spatio-temporal databases deal with objects whose properties may change with time.
• Traditional studies in spatio-temporal databases focus on retrieving the actual objects that satisfy the query predicates.
– Retrieve all vehicles that appear in the north district during 3pm to 5pm yesterday.
• A more useful type of queries may be to retrieve, instead of the actual object IDs, the number of objects that satisfy the query conditions.
– Retrieve the (approximate) number of vehicles in the north district during 3pm-5pm yesterday.
• In the above example, the spatial objects (i.e., streets in the north district) that carry aggregate information (i.e., number of cars) are static. Other queries may involve dynamic objects.
– The mobile phone antenna (i.e., the aggregate information = # of users served by the antenna) whose spatial extents (i.e., covering areas) may change over time.
6
Example (Static Objects)Example (Static Objects)
regions
T1 T2 T3 T5
R1
R2
R3
4R
150
75
132
12
150
80
127
12
145
85
125
12 12
127
90
130135
90
127
12
now
aggregate results over timestamps
369 369 367 364
T4
359
60
638
420
710
1828
aggregate results
over regions FACT TABLE
total sum
R 1 R 3
R 4
R 2
qs
Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
7
Traditional MethodsTraditional Methods
• Pre-materialization
– Even more difficult than spatial DW due to the inclusion of the temporal dimension.
• Use an aggregation tree.
– When the aggregate of a region changes, create a 3D box. An aggregate 3D R-tree is used to index all these boxes.
– Problem: The spatial extent of a region must be duplicated many times although it does not change.
3D boxes for region R1
150
145
135
130
T3
T4
T5
T1
8
Aggregate RB-treeAggregate RB-tree
T1 T2 T3 T5
R1
R2
R3
4R
150
75
132
12
150
80
127
12
145
85
125
12 12
127
90
130135
90
127
12
now
aggregate results over timestamps
369 369 367 364
T4
359
60
638
420
710
1828
aggregate results
over regions FACT TABLE
total sum R-tree for spatial dimension
1 150 3 145 4 135 5 130
1 445 4 265
1 75 2 80 3 85 4 90
1 155 3 265
1 132 2 127 3 125 4 127
1 259 3 379
1 12
B-tree for R1
B-tree for R2 B-tree for R3
B-tree for R4
1 225 2 230 4 225 5 220
1 685 4 445
B-tree for R5
1 144 2 139 3 137 4 139
1 283 3 405
B-tree for R61 369 3 367 4 364 5
1 3 723
B-tree for the whole space
359
1105
R1 710 R2 420 R3 638 R4 60
R5
1130 R6 698
R 5
R 1 R 3
R 4
R 6
R 2
qs
Spatial extents are stored only once.
9
Example (Dynamic Objects)Example (Dynamic Objects)
regions
T1 T2 T3 T5
R1
R2
R3
4R
150
75
132
12
150
80
127
12
145
85
125
12 12
127
90
130135
90
127
12
now
aggregate results over timestamps
369 369 367 364
T4
359
60
638
420
710
1828
aggregate results
over regions FACT TABLE
total sum
Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
R1
R2
R3
R4
qs
Situation during timestamps 1-4
10
Example (cont.)Example (cont.)
regions
T1 T2 T3 T5
R1
R2
R3
4R
150
75
132
12
150
80
127
12
145
85
125
12 12
127
90
130135
90
127
12
now
aggregate results over timestamps
369 369 367 364
T4
359
60
638
420
710
1828
aggregate results
over regions FACT TABLE
total sum
Query qs retrieve the aggregate sum (during time T1-T4) of all rectangles that intersect it.
R1
R2
R3
R4
qs
change position at timestamp 5
11
R 5
R 1 R 3
R 4
R 6
R 2
qs
R'5
R'1R 3
R 4
R 6
R 2
qs
Aggregate HRB-treeAggregate HRB-tree• Integrates the previous idea with the spatio-temporal access method HR-trees.
R-tree for spatial dimension
B-tree for R3
R1 R2 R3 R4
R5 R6
timestamps 1-4
R'1 R2
R'5 R6
R-tree for spatial dimension
timestamp 5
B-tree for R4
B-tree for R5
B-tree for R'1B-tree for R2
A
B C
D
E
B-tree for R1 B-tree for R'5
B-tree for R6
timestamp 5timestamp 1-4
12
Aggregate 3D RB-treeAggregate 3D RB-tree
• Creates a 3D box only when the spatial extent of an object changes.
time5
R5
R1
R2
R3
R4
R6
R'1
R'5
B-tree for R3B-tree for R1 B-tree for R4
B-tree for R5 B-tree for R6
B-tree for R2
R'1B-tree for R'5B-tree for
13
Managing Numerous B-treesManaging Numerous B-trees
• If each B-tree is too small (i.e., the rates of spatial extent and aggregate changes are similar)
– A block contains too few entries and much space is wasted.
– Not suitable for caching.
• Our solution is to use a B-File, which “packs” numerous B-trees into a single file
– Avoiding empty spaces in a disk page.
– Maintaining the same query performance.
14
PerformancePerformance• Dataset settings
– Number of spatial objects = 10,000
– History length = 1,000 timestamps
– Aggregate agility – describes how fast the aggregate information changes (4%, 8%, 16%, 32%, 64%)
– Region agility – describes how fast the spatial extents change
• 0% for static objects
• 0.01% for dynamic objects (capturing the fact that spatial dimension changes much slower than the aggregate data)
– Datasets include 500,000 to 6,500,5000 records.
• Each query contains 2 parameters: (spatial extents and interval length).
15
Results (Static Objects)Results (Static Objects)
0
50
100
150
200
250
300
350
400
4 8 16 32 64
aRB a3DR
aggregate agility (%)
Mega bytes
0
30
60
90
120
150
180
4 8 16 32 64
aRB
a3DR
aggregate agility (%)
node accesses
16
Results (Static Objects)Results (Static Objects)
0
30
60
90
120
150
180
1 25 50 75 100
aRB
a3DR
query length
node accesses
0
30
60
90
120
150
180
1 3 5 7 9
aRB
a3DR
query extent (%)
node accesses
17
Results (Dynamic Objects)Results (Dynamic Objects)
0
50
100
150
200
250
300
350
400
4 8 16 32 64
a3DR a3DRB
aHRB
aggregate agility (%)
Mega bytes
0
30
60
90
120
150
180
4 8 16 32 64
a3DR a3DRB
aHRB
aggregate agility (%)
node accesses
18
Results (Dynamic Objects)Results (Dynamic Objects)
0
50
100
150
200
1 25 50 75 100
a3DR a3DRB
aHRB
node accesses
query length
0
50
100
150
200
1 3 5 7 9
a3DR a3DRB
aHRB
query extent (%)
node accesses
19
ConclusionConclusion
• We propose indexing techniques that replace the data cube in spatio-temporal data warehouses and answer ad-hoc group-by queries very efficiently.
– Both static and dynamic spatial dimensions are discussed.
• Extensions
– Cost models that predict the performance of alternative structures.
– Query optimization based on the cost models.
– Complex query evaluation