Upload
cecilia-kelly
View
218
Download
0
Embed Size (px)
Citation preview
CSE 6392 – Data Exploration and Analysis in Relational
DatabasesApril 20, 2006
Ranking Using Materialized View
View – results of a query
Materialized View – persistent results
Two problems need to be solved:
1. Which views should be materialized?
2. Given a query, how do you best use the materialized views?
Ranking Query
f: w1x1+w2x2+…+wmxm
k (number of tuples)
output: top-k tuples
Possible ranking algorithms:- scan: only uses the
base table- TA – uses “views” for
sorted lists
x1 xm
t1
tn
Ranking Query – Materialized Views
• In this new (not yet published) work, tackling the problem of using the materialized views rather than the traditional “skinny” tables
• Assume that we already have a bunch of materialized views corresponding to ranking queries:
Ex. sorted k-tuples for functions (with materialized views):
3x1+2x2+5x3 (Q1)2x1+3x2 (Q2) 2x2+4x3 (Q3)
• If we get another query that matches one of these, can use the materialized views.
Ranking Query – An Early Idea
• However, suppose we get the following query:
Q: 2x1+4x2+x3
• How do we solve this?• An early idea:
Ex. Q: 2x1+5x2+4x3
Could do the TA algorithm on Q2 + Q3
• Linear programming.
Ranking Query – Current Solution
• Geometric background.• Suppose you have the following:
Q1: 2x1+4x2+x3, and k = 1 (top tuple)
0
0.5
1
1.5
2
2.5
3
0 0.5 1 1.5 2 2.5 3 3.5
X
Y
Perpendicular line (3,2)
iso-score line (every point on line has some score)
Highest score is the best
Ranking Query – How Does This Actually Work?
• In original TA algorithm, the advantage is the stopping condition.
• In this approach, the stopping condition is when the linear programming solution drops below the threshold.
• This paper is not published yet.
Q
t1
t2
Max value
Summary of Ranking
1) Fast execution of ranking queries/functions• scan, TA, Lp TA• inverted lists
2) Ranking function in IR• vector space/TF-IDF• probabilistic
3) Ranking on the web• PageRank• HITS
4) Ranking in databases• keyword search (DBXplorer, Discover, Ranks)• Probabilistic info retrieval