Answering Top-k Queries Using Views Updated

Embed Size (px)

Citation preview

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    1/64

    Answering Top-k Queries Using

    Views

    By:

    Gautam Das (Univ. of Texas),

    Dimitrios Gunopulos (Univ. of California Riverside),

    Nick Koudas (Univ. of Toronto),Dimitris Tsirogiannis (Univ. of Toronto)

    Presented By:

    Kushal Shah

    Lipsa Patel

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    2/64

    Views

    Definition: Views

    Declaring Views

    Advantages of using Views

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    3/64

    Views

    A viewmay be thought of as a table, that is derived

    from one or more underlying base table.

    Two kinds:

    1. Virtual: Not stored in the database; just a

    query for constructing the relation.2. Materialized: Actually constructed and

    stored.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    4/64

    Declaring Views

    Materialized:

    CREATE [MATERIALIZED]

    VIEW AS ;

    Virtual: Default

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    5/64

    Advantages of using Views

    If we have several tables in a DB and we want to

    view only specific columnsfrom specific tables we

    can go for views.

    Suffice the needs of security: Sometimes allowing

    specific users to see only specific columns based onthe permission that we can configure on the views.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    6/64

    Answering Top-k Queries Using

    Views

    By:

    Gautam Das (Univ. of Texas),

    Dimitrios Gunopulos (Univ. of California Riverside),

    Nick Koudas (Univ. of Toronto),Dimitris Tsirogiannis (Univ. of Toronto)

    Presented By:

    Kushal Shah

    Lipsa Patel

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    7/64

    Top-k Query

    Top-k Query ProcessingDefinition

    Top-k Example

    Algorithms for Top-k Query Processing

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    8/64

    Top-k Query Processing

    Top-k query processing

    =Finding k objects that have the highest overall

    Score

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    9/64

    Top-k Example

    R

    Users preferences regarding the ordering of the tuples of a

    relation can be expressed as a scoring functions on theattributes of a relation, eg

    fq = 3x1 + 2x2 + 5x3

    The top-k problem is to find the k tuples with the highest

    scoreaccording to a given scoring function.

    tid X1 X2 X3

    1 82 1 59

    2 53 19 83

    3 29 99 15

    4 80 45 8

    5 28 32 39

    fQ

    tid Score

    2 612

    1 543

    4 370

    3 360

    5 343

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    10/64

    Algorithms for Top-k Query Processing

    How? Which algorithms?Related Work How wecomplement existing approaches?

    TA [Fagin]

    PREFER [Hristidis]Stores the multiple copies of a relation and eachcopy is ordered according to a different scoringfunction.

    In order to answer a top-k query the algorithmutilizes a single copy with a scoring function whichis closest to the scoring function of the query.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    11/64

    (a, 0.9)

    (b, 0.8)

    (c, 0.72)

    (d, 0.6)

    .

    .

    .

    .

    Sorted L1

    (d, 0.9)

    (a, 0.85)

    (b, 0.7)

    (c, 0.2)

    .

    .

    .

    .

    N

    a

    b

    c

    d

    .

    .

    .

    .

    Object

    ID

    0.9

    0.8

    0.72

    0.6

    .

    .

    .

    .

    Attribute 1

    0.85

    0.2

    0.9

    .

    .

    .

    .

    Attribute 2

    0.7

    M

    Sorted L2

    ExampleSimple Database model

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    12/64

    ID A1 A2 Min(A1,A2)

    Step 1: - parallel sorted access to each list

    (a, 0.9)

    (b, 0.8)

    (c, 0.72)

    (d, 0.6)

    .

    ..

    .

    L1 L2

    (d, 0.9)

    (a, 0.85)

    (b, 0.7)

    (c, 0.2)

    .

    ..

    .

    a

    d

    0.9

    0.9

    0.85 0.85

    0.6 0.6

    For each object seen:- get all grades by random access

    - determine Min(A1,A2)

    - amongst 2 highest seen ? keep in buffer

    ExampleThreshold Algorithm

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    13/64

    ID A1 A2 Min(A1,A2)

    a: 0.9

    b: 0.8

    c: 0.72

    d: 0.6

    .

    ..

    .

    L1 L2

    d: 0.9

    a: 0.85

    b: 0.7

    c: 0.2

    .

    ..

    .

    Step 2: - Determine threshold value based on objects currently

    seen under sorted access. T = min(L1, L2)

    a

    d

    0.9

    0.9

    0.85 0.85

    0.6 0.6

    T = min(0.9, 0.9) = 0.9

    - 2 objects with overall grade threshold value ? stop

    else go to next entry position in sorted list and repeat step 1

    ExampleThreshold Algorithm

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    14/64

    ID A1

    A2

    Min(A1

    ,A2

    )

    Step 1 (Again): - parallel sorted access to each list

    (a, 0.9)

    (b, 0.8)

    (c, 0.72)

    (d, 0.6)

    .

    ..

    .

    L1 L2

    (d, 0.9)

    (a, 0.85)

    (b, 0.7)

    (c, 0.2)

    .

    ..

    .

    a

    d

    0.9

    0.9

    0.85 0.85

    0.6 0.6

    For each object seen:

    - get all grades by random access

    - determine Min(A1,A2)

    - amongst 2 highest seen ? keep in buffer

    b 0.8 0.7 0.7

    ExampleThreshold Algorithm

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    15/64

    ID A1 A2 Min(A1,A2)

    a: 0.9

    b: 0.8

    c: 0.72

    d: 0.6

    .

    ..

    .

    L1 L2

    d: 0.9

    a: 0.85

    b: 0.7

    c: 0.2

    .

    ..

    .

    Step 2 (Again): - Determine threshold value based on objects currently

    seen. T = min(L1, L2)

    a

    b

    0.9

    0.7

    0.85 0.85

    0.8 0.7

    T = min(0.8, 0.85) = 0.8

    - 2 objects with overall grade threshold value ? stopelse go to next entry position in sorted list and repeat step 1

    ExampleThreshold Algorithm

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    16/64

    c

    ID A1 A2 Min(A1,A2)

    a: 0.9

    b: 0.8

    c: 0.72

    d: 0.6

    .

    ..

    .

    L1 L2

    d: 0.9

    a: 0.85

    b: 0.7

    c: 0.2

    .

    ..

    .

    Situation at stopping condition

    a

    b

    0.9

    0.7

    0.85 0.85

    0.8 0.7

    T = min(0.72, 0.7) = 0.7

    ExampleThreshold Algorithm

    0.72 0.2 0.2

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    17/64

    Related Work for Top-k Query Processing

    TA: Sequential as well as Random Access

    PREFER

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    18/64

    Approach for Top-k Query Processing

    Top-k Query Answering using Views

    Views are Materialized(incurring space overhead)

    Advantages of using views: increased performance

    because views are small in size

    Space-Performance tradeoff

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    19/64

    Example Views

    R tid X1 X2 X3

    1 82 1 59

    2 53 19 83

    3 29 99 15

    4 80 45 8

    5 28 32 39

    Three attribute relation R

    V1 tid Score

    3 553

    4 385

    5 216

    2 201

    1 169

    Top-5 queryusing

    function f1 = 2x1 + 5x2

    V2 tid Score

    2 351

    1 237

    5 177

    3 159

    4 88

    Top-5 queryusing function

    f2 = x2 + 2x3

    Top-k ranking queries in SQL-like syntax: SELECT TOP[k] FROM R ORDER BY Score(q)Score(q) - function that assigns numeric score to any tuple t

    Ranking Views: Views only aim to rankA ranking view is the materialized result of a previously asked top-k query.

    Can we answer new top-k queries efficiently using ranking

    views? Lets see

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    20/64

    Formal Definitions

    Ranking Queries

    Ranking Views

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    21/64

    Ranking Queries

    Ranking Queries: Top-k ranking queries in SQL-likesyntax: Select Top[k] from R where Range(q) Order By

    Score(q)

    A ranking query may be expressed as a triple Q = (Score(q),

    k, Range(q)), where

    Score(q)= Function that assigns numeric score to any tuple t

    Range(q) = defines selection condition for the tuples of R Semantics: Retrieve the k tuples with the top scores

    satisfying the selection condition.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    22/64

    Ranking Views

    Materialized Ranking View V:

    for a previously executed query

    Q1= (ScoreQ1, k1, RangeQ1),the corresponding materialized ranking view is a set of

    k(tid, scoreQ(tid)) pairs,

    ordered by decreasing values of scoreQ(tid).

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    23/64

    Problems we are going to solve

    Top-k Query Answer using Views

    View Selection

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    24/64

    Top-k Query Answer using Views

    Given: Set Uof views

    Query Q

    Obtain an answer to Q combining all the information

    conveyed by the views in U

    Solution: Algorithm named LPTA

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    25/64

    Problems we are going to solve

    Top-k Query Answer using Views

    View Selection

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    26/64

    View Selection

    Problem: Given a collection of views V={V1Vr} base

    views and a query Q, determine the most efficientsubset U

    of V to execute Q on.

    Input to LPTA: subset U

    Obtaining an answer to ranking query: Running TA on base

    views.

    Find the subset U that when utilized by LPTA1. Provide answer to query

    2. Provide answer faster than running TA on the base

    views V

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    27/64

    Outline

    LPTA Algorithm

    View Selection Problem

    LPTA Li P i Ad i

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    28/64

    LPTA: Linear Programming Adaptation

    of the Threshold Algorithm

    1. Scoring function of Query: Q - fQ= 3x1 + 10x2

    2. Scoring function of Views: V1fv1= 2x1 + 5x2

    Subset of Views U V2fv2

    = x1 + 2x2

    LPTA for Top-k Query Answer using Views

    Top-1 Query

    View is a set of pairs of (tuple identifier, score).

    The LPTA algorithm requires sorted access on each view in

    non-increasing order of that score.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    29/64

    LPTA Example

    tid x1 x2 x3

    1 82 1 59

    2 53 19 83

    3 29 1 2

    4 80 22 90

    5 28 8 87

    6 12 55 827 16 99 42

    8 18 42 67

    9 42 1 23

    10 23 21 88

    Rtid Score

    7 527

    6 299

    4 270

    8 246

    2 201

    V1

    Top-5 Queryf1 = 2x1 + 5x2

    tid Score

    6 219

    4 202

    10 197

    Top-3 Query

    f2 = x2 + 2x3

    V2

    Answer Top-2 Query using LPTA

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    30/64

    LPTA Setting

    The algorithm initializes the top-k buffer to empty.

    Top-2 Buffertid Score

    7 527

    6 299

    4 270

    8 246

    2 201

    tid Score

    6 219

    4 202

    10 197

    V1 V2

    7 16 99 42

    For each tid read, random access

    on R to retrieve tuple and

    compute score acc to query

    function f3 = 3x1 + 10x2 + 5x3

    6 12 55 82 (7,1248)

    (6,996)

    Top-2 Buffer

    Check for stopping Condition

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    31/64

    Check for Stopping Condition

    The unseen tuples in the view have satisfy the following inequalities:The domain of each attribute of R [1,100]

    0

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    32/64

    Calculating Unseenmax

    Unseenmax= Solution to the linear program where we maximize the

    function f3 = 3x1 + 10x2 + 5x3 subject to these inequalities.

    A linear programming problem may be defined as the problem of

    maximizing or minimizing a linear function subject to linear

    constraints. The constraints may be equalities or inequalities. Here is

    a simple example.

    Find numbersx1 andx2 that maximize the sumx1 +x2 subject to the

    constraints

    x1 0,x2 0, and

    x1 + 2x2 4

    4x1 + 2x2 12

    x1 +x2 1

    Objective Function

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    33/64

    Maximize the function

    Convex region

    This system of inequalities defines a

    convex region.

    Occasionally, the maximum occursalong an entire edge or face of the

    constraint set, but then the maximum

    occurs at a corner point as well.

    LPTA E l

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    34/64

    LPTA - Example

    tid11

    s1

    1

    tid21

    tid31

    tid4

    1

    tid51

    s21

    s31

    s41

    s51

    tid1

    2

    s12

    tid2

    2

    tid3

    2

    tid4

    2

    tid5

    2

    s22

    s32

    s42

    s52

    V1 V2 tid11

    tid1

    2

    Top-1 queryV1

    V2

    Qstopping

    condition

    X1

    X2

    R(X1, X2)

    O(0,0)

    P(1,0)

    R (1,1)

    T (0,1)

    Normalized Domain[0,1]

    Views and top-k query represented by

    vectors denoting the direction of increasingscore

    Sweeping line perpendicular

    to V1 from infinity to origin

    Score of a tuple with respect to the query: project that tuple to the vector of the query

    Score of a tuple with respect to a view: project that tuple to the the vector of the view

    Max posssible score of any tuple not yet

    visited in the views with respect to thescoring func of query UNSEENMAX

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    35/64

    LPTA - Example (cont)

    tid11

    s11

    tid21

    tid31

    tid41

    tid51

    s21

    s31

    s41

    s51

    tid12

    s12

    tid2

    2

    tid3

    2

    tid42

    tid5

    2

    s22

    s32

    s42

    s52

    V1 V2tid1

    1

    tid1

    2

    tid21

    tid22

    Top-1 V1

    V2

    Qstopping

    conditionX1

    X2

    R(X1, X2)

    O (0,0)P (1,0)

    R (1,1)

    T (0,1)

    The algorithm will stop early if the scoring function of the views is

    similar to the scoring function of the query.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    36/64

    LPTA AlgorithmPseudo Code

    There is Sequential as well as Random Access.

    Sequential access on views

    Random Access on base table to find the tuple

    http://localhost/var/www/apps/conversion/tmp/scratch_5/LPTA%20algo.dochttp://localhost/var/www/apps/conversion/tmp/scratch_5/LPTA%20algo.doc
  • 8/12/2019 Answering Top-k Queries Using Views Updated

    37/64

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    38/64

    Determining Factor for

    performance LPTA versus TA

    Highly correlated: every sequential access incurs a random

    access.

    As a result the determining factor for the performance is

    (distance from the beginning of the view each algorithm hasto traverse (read sequentially) before coming into a halt with

    the correct answer) X (the number of views participating in

    the process).

    d=number of lock-step r = no of views

    Running Cost:

    O(dr)

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    39/64

    Outline

    LPTA Algorithm

    View Selection Problem

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    40/64

    View Selection Problem

    Given a collection of views V = {V1,,Vr} and a

    Query Q, determine the most efficient subset U C V

    to execute Q on.

    Conceptual discussion of View Selection

    Two attribute relation (in two dimension)

    Multi attribute relation (for any dimension)

    Domain of each attribute is normalized to [0,1]

    M-attribute relation is refer as m-dimension

    View Selection Two Dimension(same side)

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    41/64

    View SelectionTwo Dimension(same side)

    Min top-k tupleQ

    V1

    V2

    O (0,0)

    T (0,1)

    P (1,0)

    R (1,1)

    X

    Y

    Square

    OPRT

    Two views V1 and V2 and Query Q are represented by vectors.

    Both the view vectors are to the same side (clockwise) of the query vector

    A

    B

    B1B2

    M

    AB 1 Q passes through M & intersect unit squareABRTop-k tuples

    ABPOTRemaining tuples

    Sorted access to V1sweeping line1 to V1 from infinity to origin

    Stopping

    condition for V1:

    sweepline

    crosses AB1

    bcoz convex

    polygon

    AB1POT

    unseen tuplesand

    score(unseen)

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    42/64

    View SelectionTwo Dimension(same side)

    Conclusion

    V2 is slower compared to V1

    If several views in two dimension are available &

    all their vectors are to one side of query vector,then it is optimal for LPTA to use the vector that is

    closet to the query vector.

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    43/64

    Estimating the Number of Tuples

    Estimating and Comparing the Number of Tuples by

    simply comparing the areasof respective triangles.

    Such approach: Need to have an uniform

    distributionwithin the triangles, which is often quite

    unrealistic.

    In our approach for view selection,

    utilize the conceptual conclusions + borrow

    knowledge of actual data distribution.

    Vi S l ti T

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    44/64

    View SelectionTwo

    Dimension(either side of query)

    A

    B

    Min top-k tupleQ

    V1

    V2

    O (0,0)

    T (0,1)

    P (1,0)

    R (1,1)

    X

    Y

    A1

    B1

    M

    Can use only V1 or only V2 for execution

    If uses only v1

    to answer the

    query thestopping

    condition will be

    reached once the

    sweepline

    perpendicular tov1 crosses

    position A1B/

    For V2 - AB1

    View Selection Two

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    45/64

    View Selection Two

    Dimension(either side of query)

    A

    B

    Min top-k tupleQV1

    V2

    O (0,0)

    T (0,1)

    P (1,0)

    R (1,1)

    X

    Y

    A1

    B1

    M

    Running LPTA on both V1 and V2,

    rather than just running on only one ofV1 or V2? Two views are better than

    oneA11

    B11

    A21

    B21

    The intersection point of the sweep

    lines perpendicular to v1 and v2 ison the line AB

    The stopping

    condition isreached when the

    sweeplines resp

    crosses A11B11

    and A21B21 such

    that

    1) intersection pt

    of A11B11and

    A21B21is on line

    AB

    2)NumTuples(A11B11R) = NumTuples(A21B21PR) since algo sweeps each view in lock-step

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    46/64

    LPTA on both Views versus One

    For two views the position of each sweepline is beforetherespective stopping positions if only one view has been

    used.

    Total number of sorted accesses for two views:

    NumTuples (A11B11R) + NumTuples (A21B21R) = 2

    NumTuples (A11B11R)

    If Min (NumTuples (A1BR), NumTuples (AB1PR), 2 NumTuples

    (A11B11R)) = NumTuples (A1BR) - Use V1 If Min (NumTuples (A1BR), NumTuples (AB1PR), 2 NumTuples

    (A11B11R))= NumTuples (AB1PR) - Use V2

    Else use both V1 V2

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    47/64

    Theorem for Two Dimensional Case

    Theorem 1: Set of Views = {V1,,Vr} Query = Q

    Two Dimensional dataset

    Va= Closest to query in AnticlockwiseVc= Closest to query in Clockwise

    So they are on either side of the query

    Optimal execution of LPTA requires the use of eitherVa or Vc i.e., the use of subset from {Va, Vc}

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    48/64

    View SelectionHigher Dimension

    Extension of Theorem 1

    Theorem 2: Set of Views = {V1,,Vr} Query = Q

    m-dimensional datasetOptimal execution of LPTA requires the use of subset

    of views U C V such that |U|

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    49/64

    Outline

    LPTA Algorithm

    View Selection Problem

    Cost Estimation Framework

    C t E ti ti F k

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    50/64

    Cost Estimation Framework

    Running LPTA

    Cost Estimation Framework: The cost of running LPTAwhen a specific set of views is used to answer a query.

    Cost = total number of sequential accesses in a view

    Uses 2 views to answer a query

    Cost= 6 sequential

    accesses

    Min top-k tuple

    Can we find that cost

    without actually running

    LPTA?

    A

    B

    QV1

    V2

    C t E ti ti F k

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    51/64

    Cost Estimation Framework

    without Running LPTA

    EstimateCost(Q, U): Returns an estimate of the cost

    of running LPTA on exactly this set of views: U

    Used within SelectViews(Q,V) to search the subset

    U that minimizesEstimateCost(Q,U)

    EstimateCost(Q,U) takes into account

    Multi-attribute views

    Non-uniform data distribution

    Si l i LPTA Hi

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    52/64

    Simulating LPTA on Histograms

    rather than on views U

    Equi-depth histograms:The number of tuples in

    each bucket is the same

    Base Table R : n tuples (10)

    HiEqui-depth histogram

    b buckets2buckets : represent the distribution of

    points along the Xiattribute

    Each bucket will represent n/b data points

    10/2 = 5 data points

    Si l i LPTA Hi

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    53/64

    Simulating LPTA on Histograms

    rather than on views U

    In our estimation procedure:

    HQrepresents the distribution of score of all tuples

    of the database according to the scoring function Q

    Cannot calculate the score of all tuples, so

    approximate HQ

    Si l ti f LPTA Hi t

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    54/64

    Simulation of LPTA on Histograms Simulate LPTA in a

    bucket by bucket loc

    step to estimate thecost.

    HQ HV1 HV2

    topkmin

    HQ: approximates the score

    distribution of the query Q b buckets histograms for

    the score distribution of

    views

    n/b tuples per bucket

    Cost

    We cannot afford to run LPTA on views U

    Pre-estimate topkminbcoz we do not

    have access to actual tuples or their

    tids. The value of topkminis estimated

    from HQby determining the bucket

    that contains the kth highest tuple.Since topkminis very likely inside this

    bucket we use linear interpolation

    with in the bucket to estimate the

    topkmin

    Cheap procedurebecause we have one iteration of the

    LPTA algorithm for every n/b tuples using the valuesfrom the bucket boundaries.

    Approx the value of func

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    55/64

    Calculating the Estimated cost

    Number of buckets visited along each views = d(3)Number of views = r1(2)

    Number of tuples per bucket n/b (10)

    Compute the smallest number of tuples n1need to bescanned from the last bucket before stopping

    Estimated number of sorted access ((d-1)n/b +n1) r1

    ((2)(10) + 2) 2 = 44 Therefore running time is

    O((d-1) + logn1

    ) lock-step iteration

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    56/64

    Outline

    LPTA Algorithm

    View Selection Problem

    Cost Estimation Framework

    View Selection Algorithms

    EstimateCost(Q U) Pseudo

    http://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%202%20EstimateCost.dochttp://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%202%20EstimateCost.doc
  • 8/12/2019 Answering Top-k Queries Using Views Updated

    57/64

    EstimateCost(Q, U)Pseudo-

    code

    SelectViews(Q, V): Select the subset of views U

    which minimizes the EstimateCost

    Exhaustive (E) Approach:Estimate the cost of all

    possible subsets of V and select the subset of views

    with the smallest cost.

    Feasible for database with few attributes

    Greedy Approach: Keep expanding the set of views

    to use until the estimated cost stops reducing.

    SelectViews(Q,V)Pseudo code

    http://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%202%20EstimateCost.dochttp://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%203%20SelectV%20iews.dochttp://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%203%20SelectV%20iews.dochttp://localhost/var/www/apps/conversion/tmp/scratch_5/Algorithm%202%20EstimateCost.doc
  • 8/12/2019 Answering Top-k Queries Using Views Updated

    58/64

    Requires the solution of a single linear program. Fix the score sUniform Data distribution & very cheap

    Maximize the scoring function of the query Max(fq) using theinequalities that scoring function of each view

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    59/64

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    60/64

    Outline

    LPTA Algorithm

    View Selection Problem

    Cost Estimation Framework

    View Selection Algorithms

    Experimental Evaluation

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    61/64

    Experimental Evaluation

    Two types of dataset: Real and synthetic (uniformand zipf data with varying skew distribution)

    The real dataset contains 30K tuples from a website

    specialized on automobiles. Experiments Conducted:

    Performance comparison of LPTA, PREFER and

    TAPerformance of LPTA using each of the view

    selection algorithms

    Scalability of the LPTA algorithm

    Performance comparison of

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    62/64

    Performance comparison of

    LPTA, PREFER and TA

    Uniform dataset, 3dReal dataset, 2d

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    63/64

    Conclusions

    Using views for top-k query answering

    LPTA: linear programming adaptation of TA

    View selection problem, cost estimation framework,view selection algorithms

    Experimental evaluation

  • 8/12/2019 Answering Top-k Queries Using Views Updated

    64/64

    References

    Answering Top-k Queries Using Views:

    Gautam Das, Dimitrios Gunopulos, Nick Koudas

    Optimal Aggregation Algorithms for Middleware :

    Ronald Fagin, Amnon Lotem & Moni Naor

    aitrc.kaist.ac.kr/~vldb06/slides/R13-1.ppt