Upload
lovey
View
46
Download
0
Embed Size (px)
DESCRIPTION
University of Ioannina Dept. of Computer Science. View Usability and Safety for the Answering of Top- k Queries via Materialized Views. Eftychia Baikousi Panos Vassiliadis. Forecast. Problem of answering a top- k query through materialized top- n views - PowerPoint PPT Presentation
Citation preview
View Usability and Safety for the Answering of Top-k Queries
via Materialized Views
Eftychia BaikousiPanos Vassiliadis
University of Ioannina
Dept. of Computer Science
DOLAP 2009, Hong Kong, 6 Nov 2009 2
Forecast
Problem of answering a top-k query through materialized top-n views Theoretical guarantees when a top-n materialized
view can answer a top-k query Algorithmic techniques for answering a top-k
query from a materialized view Properties of the safe areas of views
DOLAP 2009, Hong Kong, 6 Nov 2009 3
Contents
Motivation & Problem Definition
Overview of the Method Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 4
Contents
Motivation & Problem Definition
Overview of the Method Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 5
Top-k query
Given a relation R (id, x1, x2, x3) and a query Q, sum(x1, x2, x3)
Find k tuples with highest grades according to Q
id x1 x2 x3
a 0.3 0.6 0.7
b 0.2 0.3 0.4
c 0.4 0.5 0.9
d 0.7 0.6 0.1
R
Top-2 tuples
sum
1.6
0.9
1.8
1.4
DOLAP 2009, Hong Kong, 6 Nov 2009 6
Motivating Example
Given a relation Region (id, name, today_traffic, yesterday_traffic, budget, ..) a materialized view V of top-2 regions according to the query
Q: 0.6*difftraffic + 0.4*budget
id Name t_traffic y_traffic budget V
1 LA 18 20 21 7.2
2 NY 42 54 15 -1.2
3 Dallas 26 22 8 4.4
4 Chicago 30 28 11 5.6
name V
LA 7.2
Dallas 4.4
Region V
Telecommunication Company Executives see sale reports in PDAs
Can a new top-k query (e.g. 0.5*difftraffic + 0.3*budget)be answered from V ?
DOLAP 2009, Hong Kong, 6 Nov 2009 7
Problem definition Given
a base relation R (ID, X, Y) a materialized view V (ID, X, Y, s)
that contains top-n tuples of the form (id, s) where s is defined as
s = w (a·x + y) and w, a are positive parameters
a query Q (ID, X, Y, sQ )that requests for top k ≤ n tuples of the form (id, sQ) where sQ is defined as
sQ = wQ (aQ·x + y) and wQ, aQ are positive parameters
Introduce an algorithm
that decides whether V by itself is suitable to answer Q and compute Q’s answer
DOLAP 2009, Hong Kong, 6 Nov 2009 8
Related Work Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis :
“Answering Top-k Queries Using Views”, VLDB ’06
Answer top-k query Q by making use of ranking views V
LPTA in 2-steps SelectViews (V, Q)
Selects efficient subset of views U for answering Q, U contains the sorted lists over each attribute of the relation
Answer Q from U Linear programming adaptation of TA algorithm Stopping condition : solution of linear program ≤ min (top-k)
DOLAP 2009, Hong Kong, 6 Nov 2009 9
Related Work –Geometric Representation (0)
Assume Relation R (ID, X, Y) Two views Vu( id, Score1)
and Vd( id, Score2) Query Q( id, Score)
Scoring functions of the form Score = w ( a·x +y)
Depicted as y = a-1·x
DOLAP 2009, Hong Kong, 6 Nov 2009 10
Related Work – Geometric Representation (1)
M : the kth tuple in Q
Stopping condition: sweeping line ( ) crosses position A1B
Any point below line AB has smaller score than M in regards to Q
DOLAP 2009, Hong Kong, 6 Nov 2009 11
Related Work – Geometric Representation (2)
Stopping condition: intersection point S of sweeping lines ( , ) lies on line AB
Any point below line AB has smaller score than M in regards to Q
DOLAP 2009, Hong Kong, 6 Nov 2009 12
Related Work SelectViews (V,Q) is Data dependant
based on estimation of the last tuple of Q according to the data distribution
No theoretically established guarantees that the set of views will answer Q
DOLAP 2009, Hong Kong, 6 Nov 2009 13
Contents
Motivation & Problem Definition
Overview of the Method Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 14
Overview of the method
1. Theoretical guarantees of Answering a query Q via a view VU
2. Theoretical guarantees are too strict
3. Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009 15
Example
id x y V
a 7 4 15
b 2 7 16
c 4 2 8
d 1 1 3
Q
18
11
10
3
R
V top-3 with score x+2y Q top-1 with score 2x+y
DOLAP 2009, Hong Kong, 6 Nov 2009 16
Construction of safe area
VU(ID, X, Y, sU) Containing top n tuples with score sU=wU(aU·x+y)
tN the nth tuple in VU
LU :xNUyNU line perpendicular to VU passing from tN and meeting axes X and Y
LQ:xNUyQ line perpendicular to Q passing from xNU
DOLAP 2009, Hong Kong, 6 Nov 2009 17
Safe area
Safe area defined as the area “above” line LQ
(shaded area) Observations
Any tuple in safe area has score (in regards to Q) higher than any tuple outside the safe area
Tuples in safe area belong in both VU and Q
DOLAP 2009, Hong Kong, 6 Nov 2009 18
Answering Q from VU
THEOREM 1
VU can answer Q if safe area contains at least k tuples
Inverse does not always hold
DOLAP 2009, Hong Kong, 6 Nov 2009 19
Overview of the method
1. Theoretical guarantees of Answering a query Q via a view VU
2. Theoretical guarantees are too strict
3. Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009 20
Answering Q from VU cont.
THEOREM 2It is possible that VU can answer Q if safe area contains less than k tuples
This holds when:area defined by (yellow triangle) line LU, X-axis and line L1 producing the
lowest possible score for Q from tuples of VU
Is void of tuples
DOLAP 2009, Hong Kong, 6 Nov 2009 21
Algorithm TestViewSuitability Three main steps
Step 1: Compute safe area (Q, V)
Step 2: Count tuples in V that belong in the safe area
Step 3: If there are more than k, then return (true)Else return (false)
DOLAP 2009, Hong Kong, 6 Nov 2009 22
Overview of the method
1. Theoretical guarantees of Answering a query Q via a view VU
2. Theoretical guarantees are too strict
3. Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009 23
Combining two views
Lines LQU , LQD Q characterizing the safe
areas for VU and VD
LQU ║ LQD
safe area of one view (VU ) encompassed in safe area of the other view (VD)
DOLAP 2009, Hong Kong, 6 Nov 2009 25
Contents
Motivation & Problem Definition
Overview of the Method Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 26
Experimental methodology Test the following methods
Our algorithm
TA algorithm (it can guarantee view usability correctness)
For the following goals Effectiveness
Number of queries answered by views
Efficiency
Time savings from usage of queries
DOLAP 2009, Hong Kong, 6 Nov 2009 27
Experimental methodology
Synthetic data sets: Random data sets of different sizes for a relation of the form
R (ID, X, Y) Sequence of queries with random coefficients and result size k
Size of source table R (tuples) |R| 1x104, 5x104, 1x105
Max size of mat. View (tuples) k 10, 50, 100, 500, 1000
Number of queries asked |Q| 100, 1000
Experimental parameters:
DOLAP 2009, Hong Kong, 6 Nov 2009 28
Effectiveness Percentage of views used for 100 queries
DOLAP 2009, Hong Kong, 6 Nov 2009 29
Effectiveness Percentage of views used for different time spans
DOLAP 2009, Hong Kong, 6 Nov 2009 30
Efficiency Time savings from the usage of queries for different database sizes
and requested results Conflicting case The number of stored
results rises, while the savings drop
Due to the size of used memory Memory allocation
becomes slow Probably one view is
able to answer lot of queries
Savings increase for reasonable k’s of size 0.1%
DOLAP 2009, Hong Kong, 6 Nov 2009 31
Contents
Motivation & Problem Definition
Overview of the Method Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 32
Conclusions
We have provided theoretical and algorithmic results for the problem of answering top-k queries via materialized views
Theoretical – algorithmic results: Theorem1: Theoretical guarantees for a view to
answer a top-k query, Theorem2: Strictness of Theorem1 Parallelism of safe areas
DOLAP 2009, Hong Kong, 6 Nov 2009 33
Contents
Motivation & Problem Definition
Overview of the Method
Theoretical guarantees
Strictness of theorem
Safe area properties
Experiments
Conclusions
Future extensions
DOLAP 2009, Hong Kong, 6 Nov 2009 34
Future Work
Optimization in case of time and storage constraints View Caching
Hierarchical structures for the set of views
Sorting techniques
DOLAP 2009, Hong Kong, 6 Nov 2009 35
Thank you for your attention!
… many thanks to our hosts!
DOLAP 2009, Hong Kong, 6 Nov 2009 36
Auxiliary Time Savings