8
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Embed Size (px)

Citation preview

Page 1: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

CSE 6392 – Data Exploration and Analysis in Relational

DatabasesApril 20, 2006

Page 2: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Using Materialized View

View – results of a query

Materialized View – persistent results

Two problems need to be solved:

1. Which views should be materialized?

2. Given a query, how do you best use the materialized views?

Page 3: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Query

f: w1x1+w2x2+…+wmxm

k (number of tuples)

output: top-k tuples

Possible ranking algorithms:- scan: only uses the

base table- TA – uses “views” for

sorted lists

x1 xm

t1

tn

Page 4: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Query – Materialized Views

• In this new (not yet published) work, tackling the problem of using the materialized views rather than the traditional “skinny” tables

• Assume that we already have a bunch of materialized views corresponding to ranking queries:

Ex. sorted k-tuples for functions (with materialized views):

3x1+2x2+5x3 (Q1)2x1+3x2 (Q2) 2x2+4x3 (Q3)

• If we get another query that matches one of these, can use the materialized views.

Page 5: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Query – An Early Idea

• However, suppose we get the following query:

Q: 2x1+4x2+x3

• How do we solve this?• An early idea:

Ex. Q: 2x1+5x2+4x3

Could do the TA algorithm on Q2 + Q3

• Linear programming.

Page 6: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Query – Current Solution

• Geometric background.• Suppose you have the following:

Q1: 2x1+4x2+x3, and k = 1 (top tuple)

0

0.5

1

1.5

2

2.5

3

0 0.5 1 1.5 2 2.5 3 3.5

X

Y

Perpendicular line (3,2)

iso-score line (every point on line has some score)

Highest score is the best

Page 7: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Ranking Query – How Does This Actually Work?

• In original TA algorithm, the advantage is the stopping condition.

• In this approach, the stopping condition is when the linear programming solution drops below the threshold.

• This paper is not published yet.

Q

t1

t2

Max value

Page 8: CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006

Summary of Ranking

1) Fast execution of ranking queries/functions• scan, TA, Lp TA• inverted lists

2) Ranking function in IR• vector space/TF-IDF• probabilistic

3) Ranking on the web• PageRank• HITS

4) Ranking in databases• keyword search (DBXplorer, Discover, Ranks)• Probabilistic info retrieval