Upload
gracie
View
34
Download
0
Embed Size (px)
DESCRIPTION
An Interactive Framework for Raster Data Spatial Joins. Wan Bae (Computer Science, University of Denver) Petr Vojtěchovský (Mathematics, University of Denver) Shayma Alkobaisi (Computer Science, University of Denver) Scott T. Leutenegger (Computer Science, University of Denver) - PowerPoint PPT Presentation
Citation preview
1 ACM GIS 2007
An Interactive Framework for Raster Data Spatial Joins
Wan Bae (Computer Science, University of Denver)
Petr Vojtěchovský (Mathematics, University of Denver)
Shayma Alkobaisi (Computer Science, University of Denver)
Scott T. Leutenegger (Computer Science, University of Denver)
Seon Ho Kim (Computer Science, University of Denver)
2 ACM GIS 2007
Outline
Introduction
Issues and Problems
Probabilistic Joins
Sampling Joins
Interactive Framework
Experiments
Conclusion
3 ACM GIS 2007
Geographic Information Systems
Web applicationWeb application
datadata datadata
datadata
• CollectCollect• StoreStore• RetrieveRetrieve
• Integration of georeferenced dataIntegration of georeferenced data• Spatial queriesSpatial queries• Complex spatial data analysis & Complex spatial data analysis & modeling for decision supportmodeling for decision support
GIS
Web application
UsersUsers
datadata
datadatadatadata
4 ACM GIS 2007
Raster Data Model
(a) Satellite Image
0 1 2 3 4 5 6 7 8 90 R T1 R T2 H R3 R4 R R5 R6 R T T H7 R T T8 R9 R
(b) Raster Model
• A great portion of georeferenced data• Simple data structure but greater storage space• Continuously changing data
5 ACM GIS 2007
Continuously Changing Data
6 ACM GIS 2007
Raster Data Spatial Joins
(a) (b)
“Find the regions where rainfall rate is greater than 1.0 and wind speed is greater than 50”
7 ACM GIS 2007
Issues for User-driven Data Exploration
Fast Query response time
– Time consuming for exact answers due to large size of data sets
– Time intensive GIS decision support queries
– Lack of optimization and approximation techniques for raster data joins
Interactive query processing
– Lack of interactivities in traditional GIS
– No user control over query processing Visualization increases the utility of the GIS
8 ACM GIS 2007
Our Approach
Fast approximation of query results
1. probabilistic join
2. sampling join
Visualize intermediate results
1. “big picture” of query result
2. partial result: non-blocking joins
Allow users to control query processing
For faster and more effective decision support queries:
9 ACM GIS 2007
Our Approximations
2. Can use the result of a subset of data cell joins for the final answer?
R (8/16) S (9/16) = they must join!
1. What is the probability that R joins S?
1 joins / 2 cells ? / 16 cells
10
ACM GIS 2007
Augmented Quad-trees
Both data sets are indexed using Quad-trees
NW
SESW
NE NW
SESW
NE
11
ACM GIS 2007
Join Probability
Let X = [0, 1], m and n be randomly chosen intervals in X of length a, b. The probability p that m ∩ n ≠ 0
Join Probability of p (m ∩ n ≠ 0) = ?
12
ACM GIS 2007
1-d Join Probability
0 1
overlapped
dxaxaaxba
bapb
1
0
},0max{}1,min{)1)(1(
1),(
aa1 a2m
bb1 b2n
x x+bb1-b q
p
a1-a
13
ACM GIS 2007
2-d Join Probability
1
1
1111
1
1 1
1 ),(),()1)(1(
1),( dbda
b
b
a
apbap
babap
a b
a1
a2 a
m
b1
b2b
n
0
14
ACM GIS 2007
Look-up table for 2-d Join Probability
P 0.1 0.2 0.3 0.4 0.5
0.1 0.4636 0.6228 0.7414 0.8317 0.8997
0.2 0.6228 0.7683 0.8640 0.9277 0.9681
0.3 0.7414 0.8640 0.9343 0.9738 0.9930
0.4 0.8317 0.9277 0.9738 0.9937 0.9995
0.5 0.8997 0.9681 0.9930 0.9995 1.0
15
ACM GIS 2007
Probabilistic Join (PJ)
p( , )4
2
4
2
p( , )16
9
16
8
16
ACM GIS 2007
Probabilistic Join Result
(b) data set S (65536 x 65536)
(a) data set Q (65536 x 65536)
(e) 4th level joins(c) 2th level joins (d) 3th level joins
17
ACM GIS 2007
Incremental Stratified Sampling Join (ISSJ)
Utilize stratified random sampling technique from quad-
trees of two data sets R and S
Data randomization: Acceptance/Rejection method
1. Sampling step: sample data from outer data set R
2. Spatial joining step: joins with the corresponding data cell on inner data set S
3. Refining step: running estimates and confidence intervals
4. Visualization: display partial results (actual join results)
18
ACM GIS 2007
Stratified Random Sampling
ST1 ST2 ST3 ST4
02 21
ST1
ST2 ST3
ST4
19
ACM GIS 2007
Estimates and Confidence Interval
Population Proportion: fraction indicating the part of the sample having a particular interest
Estimated Value: the statistic computed from sample information using population proportion
Confidence interval: an interval that estimates a population parameter within a range of possible values at specified probability
Confidence level: the specified probability
20
ACM GIS 2007
Incremental Sampling Join Result
(b) Partial result(a) Estimated result
IA
NE
WI
CO
KS
MI
state airports confidence
interval
13
22
19
15
11
8
0.05 0.05 0.05 0.05 0.05 0.05
95
95
95
95
95
95
10% done
21
ACM GIS 2007
Interactive Join Framework
22
ACM GIS 2007
Experiments
PJ and ISSJ compared to full Quad-tree join.
Confidence level set to 95% in ISSJ
Varied buffer size and data sets size.
Data sets:
– Synthetic: U E, E U, U U
(65536 65536 and 262144 262144)
– Real: 6 data sets mineral resources for each state of AZ, CO, OR and WY from U.S. Geological Survey
(65536 65536)
23
ACM GIS 2007
Actual joins vs. 2-d PJ
sample size actual joins 2-d (error)
5% 54 48 (0.1060)
10% 109 99 (0.0917)
20% 218 197 (0.0963)
50% 545 494 (0.0936)
24
ACM GIS 2007
Accuracy of Estimates of ISSJ
Estimates vs. exact value for real data sets
number of processed cells
25
ACM GIS 2007
Time for Confidence Interval of ISSJ
Confidence Interval and I/Os for real data sets
sampling joinfull quad-tree join
26
ACM GIS 2007
ISSJ vs. PJ vs. Actual joins
(a) ISSJ w/10% CI (b) ISSJ w/5% CI
(a) Actual join (d) PJ
27
ACM GIS 2007
Time for Confidence Intervals
I/Os of PJ, ISSJ and the full quad-tree join for Colorado
28
ACM GIS 2007
Conclusion
A novel spatial join, Probabilistic Join, for raster data joins for obtaining a “big picture” visualization of query answer
An interactive raster spatial join algorithm, Incremental Refining Spatial Join, for confidence interval bounded estimated query answer of raster data joins
29
ACM GIS 2007
Thank you!