Leonardo Guerreiro AzevedoLeonardo Guerreiro AzevedoGeraldo ZimbrãoGeraldo ZimbrãoJano Moreira de SouzaJano Moreira de Souza
Approximate Query Processing in Spatial Databases Using Approximate Query Processing in Spatial Databases Using Raster SignaturesRaster Signatures
Federal University Federal University of of
Rio de JaneiroRio de Janeiro
{azevedo, zimbrao,jano}@cos.ufrj.br
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FINAL CONSIDERATIONS
FINAL CONSIDERATIONS
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
PROPOSALS OF ALGORITHMS
PROPOSALS OF ALGORITHMS
FIRST CONSIDERATIONS
GOALS AND CONTRIBUTIONS
FOUR-COLOR RASTER SIGNATURE (4CRS)
Presentation plan
4CRS
FOUR-COLOR RASTER SIGNATURE (4CRS)
GOALS AND
CONTRIBUTIONS
GOALS AND CONTRIBUTIONS
FIRST CONSIDERATIONS
FIRSTCONSIDERATIO
NS
There are many cases where a query can take a long time to be processed, for example:– When processing huge volume of data that requires a
large number of I/O operations• Disk access time is still higher than memory access time
– When processing high complex queries– When accessing remote data due to a slow network
link or even temporary non-availability
... ... ...
Motivation
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
An exact An exact answer can answer can demand a demand a long timelong time
There are many cases where a query can take a long time to be processed, for example:– When processing huge volume of data that requires a
large number of I/O operations• Disk access time is still higher than memory access time
– When processing high complex queries– When accessing remote data due to a slow network
link or even temporary non-availability
... ... ...
A fast answer A fast answer can be more can be more important than important than an exact an exact responseresponse
Motivation
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Motivation
The challenge becomes bigger in spatial data environments.
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
399,0000 segments 475,434 segments
Motivation
Precision of the query can be lessened, and an approximate answer returned to the user– Approximate answers can be quickly computed– Acceptable precision
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Motivation
There are many approaches on the approximate query processing field, however most of them are not suitable for spatial data.
“Research new techniques for approximate query processing that support the uniqueness of spatial data is a major issue in the database field”. (Roddick et al., 2004)
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Scenarios and Applications
Decision Support System Increasing business competitivenessMore use of accumulated data
Data miningDuring drill down query sequence in ad-hoc data miningEarlier queries in a sequence can be used to find out the interesting queries.
Data warehousePerformance and scalability when accessing very large volumes of data during the analysis process.
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Scenarios and Applications
Mobile computingAn approximate answer may be an alternative:
When the data is not availableTo save storage space
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Exact answeres
Traditional SDBMS query processing environment
Queries
Spatial DBMSSlowSlow
New data (inserts or updates)
Deleted data
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
SDBMS set-up for providing approximate query answers
Spatial DBMS
New data (inserts or updates)
Deleted data
Approximate Answer + conf.
IntervalFast answerFast answer
ApproximateQuery Processing
Engine
Exactanswer
Queries
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FIRSTCONSIDERATIO
NS
Goals
Execute approximate query processing in Spatial Databases using Raster Signature– Four-Color Raster Signature (4CRS) (Zimbrao and
Souza, 1998).
Provide fast approximate query answers for queries over spatial data.
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Contributions
Proposals of algorithms for many spatial operations that can be approximately processed using 4CRS
Spatial operators returning numbers Area, distance, diameter, perimeter…
Spatial predicates Equal, different, disjoint, area disjoint, inside, meet,
adjacent…Operators returning spatial data type values
Intersection, plus (union), minus, common border…Spatial operators on set of objects
Sum, closest, decompose, overlay, fusion.
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Contributions
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Proposals of algorithms Approximate Area of Polygon Distance Diameter Perimeter and Contour Equal and Different Disjoint, Area Disjoint, Edge Disjoint Inside (Encloses), Edge Inside, Vertex Inside Intersects and Intersection Overlay Adjacent, Border in Common, Common border Plus and Sum Minus Fusion Closest Decompose
Four-Color Raster Signature (4CRS)
4CRS is a raster approximationIt is an object representation upon a grid of cells
Grid resolution can be changed Precision × Storage requirements
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Four-Color Raster Signature (4CRS)
Bit value Cell type Description
00 Empty The cell is not intersected by the polygon
01 Weak The cell contains an intersection of 50% or less with the polygon
10 Strong The cell contains an intersection of more than 50% with the polygon and less than 100%
11 Full The cell is fully occupied by the polygon
Each cell stores relevant information using few bits4CRS 4 types of cells
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Four-Color Raster Signature (4CRS) - Generation
Polygon
4CRS
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Approximate Area of Polygon
Approximate area of polygon
Approximate area of polygon within window
Approximate overlapping area of polygon join
Based on the expected area of polygon within cell
Based on the expected area of polygon within cell
Based on the intersection expected area of two types of cells
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
E
F
W
S
Expected Area = zero% µ = 0
Expected Area = 100% µ = 1
Expected area (µ) of cell type
Expected Area (0, 0.50] µ = 0.25
Expected Area (0.50, 1) µ = 0.75
Approximate area of polygon
Approximate area of polygonApproximate area of polygon within cell
cellareaanswer eApproximatt
t
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Grid and polygon are independent from each other
Approximate overlapping area of polygon join
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
W E×S E
S W
S S
×××
µW×E
µS×E
µS×W
µS×S
expected area of cells overlapping
Approximate overlapping area of polygon join
Cell types Empty Weak Strong Full
Empty 0 0 0 0
Weak 0 0.0625 0.1875 0.25
Strong 0 0.1875 0.5625 0.75
Full 0 0.25 0.75 1
jiji cellareaanswer eApproximat
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Table of expected area of cells overlapping
Affinity degree
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
For other algorithms, when evaluating cell types it is also required to compute an approximate value in the interval [0,1] that indicates a true percentage of the response Affinity deggree: it is based on expected area of cells overlapping (Azevedo et al., 2005).
Cell types Empty Weak Strong Full
Empty 0 0 0 0
Weak 0 0.0625 0.1875 0.25
Strong 0 0.1875 0.5625 0.75
Full 0 0.25 0.75 1
Table of affinity degree
For some proposed algorithms, it is possible to return an approximate answer evaluating only cell types.
Equal
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Equal algorithm using 4CRS the approximate answer is equal to the sum of affinity degrees divided by the number of comparisons of pair of objects, if no trivial case occurs.
E×W
S S
F F
×××
µE×E = 1
µW×W = 0.0625
µS×S = 0.5625µF×F = 1
E
W
Sum of affinity degreeTrivial case:
not equal overlap of different cell types result false
S E×S W×F S×
Different
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Different algorithm is opposite to equal algorithm Affinity degree is equal to the 1 - affinity degrees
S E×S W×F S×
Trivial case: different overlap of different
cell types result trueµE×E = 0
µW×W = 1-0.0625
µS×S = 1-0.5625µF×F = 0
Sum of affinity degree
E×W
S S
F F
×××
E
W
Disjoint
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Disjoint: two objects are disjoint if they have no portion in common
Case III: weak × weak weak × strong
×W W
E
×S
EW
Case II: Only overlap of Disjoint (partial answer)Affinity degree += 1
F
Disjoint (partial answer)Affinity degree += 1 – expected area(type1,type2)
W S×
×S
F
W
Case I: At least one overlap of
Trivial case:Not disjoint (exact answer)
F
S S×
Distance
Distance can be estimate from 4CRS signatures computing the distance among cells corresponding to polygons’ borders (Weak and Strong cells).
Distance = average of the minimum and maximum distances
... ... ...
(a) (b) (c)
Minimumdistance
Maximumdistance
FINAL CONSIDERATIONS
PROPOSALS OF ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
PROPOSALS OF ALGORITHMS
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FINAL CONSIDERATIONS
EXPERIMENTAL RESULTS
IMPL. AND EVAL. ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FINAL CONSIDERATIONS
EXPERIMENTAL RESULTS
IMPL. AND EVAL. ALGORITHMS
Conclusions
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Goal Provide an estimated result in orders of magnitude less time than the time
to compute an exact answer, along with a confidence interval for the answer.
Proposals Use raster approximations for approximate query processing in
spatial databases Use 4CRS signature to process the queries over polygons,
avoiding accessing the real data. Proposal many algorithms for approximate processing
Use expected area of polygons (Azevedo et al., 2005) to estimate responses
Implement and evaluate algorithms involving other kinds of datasets, for example, points and polylines, and combinations of them:
• point × polyline, polyline × polygon and polygon × polyline.
The experimental evaluation is not addressed in this work; it is on going work developed on Secondo (Güting et al., 2005) which is an extensible DBMS platform for research prototyping and teaching.
FINAL CONSIDERATIONS
EXPERIMENTAL RESULTS
IMPL. AND EVAL. ALGORITHMS
4CRSGOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
FINAL CONSIDERATIONS
EXPERIMENTAL RESULTS
IMPL. AND EVAL. ALGORITHMS
Future work
4CRS
GOALS AND
CONTRIBUTIONS
FIRSTCONSIDERATIO
NS
Leonardo Guerreiro AzevedoLeonardo Guerreiro AzevedoGeraldo ZimbrãoGeraldo ZimbrãoJano Moreira de SouzaJano Moreira de Souza
Approximate Query Processing in Spatial Databases Using Approximate Query Processing in Spatial Databases Using Raster SignaturesRaster Signatures
Federal University Federal University of of
Rio de JaneiroRio de Janeiro
{azevedo, zimbrao,jano}@cos.ufrj.br