38
Query Execution Since our SQL queries are very high level the query processor  does a lot of processing to supply all the details.  An SQL query is translated internally into a relational algebra expression. One advantage of using relational algebra is that it makes alternative forms of a query easier to explore. The different algebraic expressions for a query are called logical query plans. We will focus first on the methods for execution of the operations of the relational algebra.

Query Exec 1

Embed Size (px)

DESCRIPTION

vvbcxb

Citation preview

  • 5/28/2018 Query Exec 1

    1/38

    Query Execution

    Since our SQL queries are very high level the query

    processordoes a lot of processing to supply all the details.

    An SQL query is translated internally into a relational

    algebraexpression.

    One advantage of using relational algebra is that it makesalternative forms of a query easier to explore.

    The different algebraic expressions for a query are called

    logical query plans.

    We will focus first on the methods for execution of the

    operations of the relational algebra.

  • 5/28/2018 Query Exec 1

    2/38

    Query

    Compilation(Chapter 16)

    Queryexecution

    (Chapter 15)

  • 5/28/2018 Query Exec 1

    3/38

    Preview of Query Compilation

    Parsing: read SQL, output relational algebra tree

    Query rewrite: Transform tree to a form, which is more

    efficient to evaluate

    Physical plan generation:select implementationfor each operator in tree,and for passing results up the tree.

    In this chapter we will focus on the implementation for eachoperator.

  • 5/28/2018 Query Exec 1

    4/38

    Relational algebra for realSQL

    Basic SELECT-FROM-WHEREqueries correspond to

    (( .. .. ..))in relational algebra

    For full SQL support we need additional constructs

    A relation in algebra is a set

    A relation in SQL might be a bag

    Bag = set with duplicates allowed

  • 5/28/2018 Query Exec 1

    5/38

    Relational Algebra (RA) on bags

    RA union, intersectionand differencecorrespond toUNION, INTERSECT, and EXCEPTin SQL

    These are in fact set operators in SQL. If you want bagversions use ALL.

    The selection corresponds to the WHERE-clause in SQL The projection corresponds to SELECT-clause The product corresponds to FROM-clause The joinscorresponds to JOIN, NATURAL JOIN, and

    OUTER JOINin the SQL2 standard

    The duplicate elimination corresponds to DISTINCTinSELECT-clause

    The grouping corresponds to GROUP BY

    The sorting corresponds to ORDER BY

  • 5/28/2018 Query Exec 1

    6/38

    Bag union, intersection, and difference Card(t,R) means the number of occurrences of tuple tin

    relation R

    Card(t, RS) = Card(t,R) + Card(t,S)

    Card(t,RS) = min{Card(t,R), Card(t,S)}

    Card(t,RS) = max{Card(t,R)Card(t,S), 0}

    Example: R= {A,B,B}, S = {C,A,B,C}

    R S = {A,A,B,B,B,C,C}

    R S = {A,B}

    RS = {B}

  • 5/28/2018 Query Exec 1

    7/38

    Beware: Bag Laws != Set Laws

    Not all algebraic laws that hold for sets also hold for bags.

    For one example, the commutative law for union (R S=

    SR ) doeshold for bags.

    -Since addition is commutative, adding the number oftimes that tuple xappears in Rand Sdoesnt depend

    on the order of Rand S.

    Set union is idempotent, meaning that SS= S. However, for bags, if xappears n times in S, then it

    appears 2n times in SS.

    Thus SS!= S in general.

  • 5/28/2018 Query Exec 1

    8/38

    Selection --

    The condition Cmight involve

    Arithmetic (+,-, ) or string operators such as LIKE Comparison between terms, e.g. a < bor a+b = 10.

    Boolean connectives AND, OR, and NOT

    Example: R =

    )(RC

    a b----0 12 34 52 3

    )(1 Raa b----2 34 52 3

    )(63 Rbab

    a b----4 5

  • 5/28/2018 Query Exec 1

    9/38

    Projection --

    Argument L of is a sequence of elements of the following

    form:

    A single attribute in R, or

    An expression x y, where x and y are attribute names, or

    An expression E z, where E is an expression involving

    attributes in Rand z is a new attribute name not in R

    Example: R =

    )(RL

    a b c

    ------

    0 1 2

    0 1 2

    3 4 5

    )(, Rxcba a x----

    0 3

    0 33 9

    )(, Rybcxab

    x y

    ----

    1 1

    1 11 1

  • 5/28/2018 Query Exec 1

    10/38

    Product --

    Each copy of the tuple

    (1,2)of Ris being paired

    each tuple of S. So, the duplicates do not

    an effect on the way we

    compute the product.

    R( A, B ) S( B, C )

    1 2 3 45 6 7 81 2

    R S = A R.B S.B C1 2 3 41 2 7 85 6 3 45 6 7 81 2 3 41 2 7 8

  • 5/28/2018 Query Exec 1

    11/38

    Natural JoinThe natural joinof R and S can be expressed by

    starting with the product R S, then apply the selectionoperator with a condition Cof the

    form

    R.A1=S.A1AND R.A2=S.A2ANDANDR.An=S.An

    where A1,A2,,Anare all the attributes appearing in the schema

    of both R and S. Finally, we must project out one copyof each

    of the equated attributes.

    R C S = L(C( R S))

    Where Lis the list of attributes in Rfollowedby the list of

    attributes in Sthat are not in R.

  • 5/28/2018 Query Exec 1

    12/38

    Theta-Join

    Again, each copy of the tuple (1,2)of Ris being paired each tuple of S

    and they join succesfully.

    So, the duplicates do not an effect on the way we compute the theta

    join.

    R( A, B ) S( B, C )1 2 3 4

    5 6 7 81 2

    R R.B

  • 5/28/2018 Query Exec 1

    13/38

    Duplicate Elimination

    R1 := (R2).

    R1 consists of one copy of each tuple that appears in R2 one

    or more times.

    R = A B1 23 4

    1 2

    (R) = A B1 2

    3 4

    G i O

  • 5/28/2018 Query Exec 1

    14/38

    Grouping Operator

    R1 := L(R2). L is a list of elements that are either:

    1. Individual (grouping) attributes.

    2. AGG(A), where AGG is one of the aggregation

    operators andA is an attribute.

    a. The most important examples: SUM, AVG, COUNT,

    MIN, and MAX.

    SELECT starName, MIN(year) AS minYear

    FROM StarsIn

    GROUP BY starName

    HAVING COUNT(title) >= 3;

  • 5/28/2018 Query Exec 1

    15/38

    Applying L(R)

    Group Raccording to all the grouping attributes on list L.- That is, form one group for each distinct listof

    values for those attributes in R.

    Within each group, compute AGG(A) for eachaggregation on list L.

    Result has grouping attributes and aggregations as

    attributes.

    - There is one tuple for each list of values for the

    grouping attributes and their groups aggregations.

  • 5/28/2018 Query Exec 1

    16/38

    Example: Grouping/Aggregation

    R = A B C1 2 3

    4 5 61 2 5

    A,B,AVG(C)(R) = ??

    First, group R :A B C1 2 31 2 54 5 6

    Then, average Cwithin

    groups:

    A B AVG(C)1 2 44 5 6

  • 5/28/2018 Query Exec 1

    17/38

    Example: Grouping/Aggregation

    StarsIn(title, year, starName) Suppose we want, for each star who has appeared in at

    least three movies the earliest year in which heappeared.

    - First we group, using starName as a groupingattribute.

    - Then, we have to compute the MIN(year) for eachgroup.

    - However, we need also compute COUNT(title)

    aggregate for each group, in order to filter out thosestars with less than three movies.

    ctTitle>3[starName,MIN(year)minYear,COUNT(title)ctTitle(StarsIn)

  • 5/28/2018 Query Exec 1

    18/38

    Expression trees

    MovieStar(name, addr,

    gender, birthdate)StarsIn(title, year,

    starName)

    SELECT title, birthdate

    FROM MovieStar, StarsIn

    WHERE year = 1996 AND

    Gender = F AND

    starName = name;

  • 5/28/2018 Query Exec 1

    19/38

    Join method?

    Can we pipeline the result of one or both selections, and avoid

    storing the result on disk temporarily?

    Are there indexes on MovieStar.gender and/or StarsIn.year that

    will make the 's efficient?

    How to

    generate such

    alternativeexpression

    trees will be

    Chapter 16.

  • 5/28/2018 Query Exec 1

    20/38

    Physical query plan operators Physical query plans are built from physical operators.

    -

    Often the physical operators are particular implementations ofthe relational algebra operators.

    However, there are also other physical operators for othertasks. E.g.

    -Table-scan(the most basic operation we want to perform in aphysical query plan)

    - Index-scan(E.g. if we have a sparse index one some relationR we can retrieve the blocks of R by using the index)

    - Sort-scan(takes a relation and a specification of the

    attributes on which the sort is to be made, and produces R insorted order)

  • 5/28/2018 Query Exec 1

    21/38

    Model of Computation

    When comparing algorithms for the same operations wewill make an assumption:

    We assume that the arguments of any operator arefound on disk, but the result of the operator is left in

    main memory.

    This is because the cost of writing the output on the diskdepends on the size of the result, not on the way the

    result was computed.

    Also, we can pipeline the result (through iterators) toother operators, when the result is constructed in mainmemory a small piece at a time.

  • 5/28/2018 Query Exec 1

    22/38

    Cost parameters

    M= number of main memory buffers available (1buffer =1block)

    B(R)= number of blocks of R

    T(R)= number of tuples of R

    V(R, a)= number of different values in column a of R V(R, L)= number of different L-values in R (L list of

    attributes)

    The cost of scanning R:

    B(R) if R is clustered, and

    T(R) otherwise

    It t f I l t ti f

  • 5/28/2018 Query Exec 1

    23/38

    Iterators for Implementation of

    Physical Operators

    This is a group of three functions that allow a consumer ofthe result of a physical operation to get the result one tuple

    at a time.

    An iterator consists of three parts:Open:Initializes data structures. Doesnt return tuples etNext:Returns next tuple & adjusts the data

    structures

    lose:Cleans up afterwards We assume these to be overloaded names of methods.

  • 5/28/2018 Query Exec 1

    24/38

    Iterator for tablescan operatorOpen(R) {

    b := the first block of R;

    t := the first first tuple of block b;Found := TRUE;

    }

    GetNext(R) {

    IF (tis past the last tuple on block b) {

    increment bto the next block;IF (there is no next block) {

    Found := FALSE;

    RETURN;

    }

    ELSE /*bis a new block*/

    t := first tuple on block b;

    oldt := t; /*Now we are ready to return t and increment*/

    increment tto the next tuple of b;

    RETURN oldt;

    }

    Close(R) {}

    It t f B U i f R d S

  • 5/28/2018 Query Exec 1

    25/38

    Iterator for Bag Union of R and SOpen(R,S) {

    R.open();

    CurRel := R;}

    GetNext(R,S) {

    IF (CurRel = R) {

    t := R.GetNext();

    IF(Found) /*R is not exhausted*/RETURN t;

    ELSE /*R is exhausted*/ {

    S.Open();

    CurRel := S;

    }

    }

    /*Here we read from S*/

    RETURN S.GetNext();

    /*If s is exhausted Found will be set to FALSE by S.GetNext */

    }

    Close(R,S) {

    R.Close();

    S.Close()}

  • 5/28/2018 Query Exec 1

    26/38

    Iterator for sort-scan In an iterator for sort-scan

    Open has to do all of 2PMMS, except themerging

    GetNext outputs the next tuple from the merging

    phase

  • 5/28/2018 Query Exec 1

    27/38

    Algorithms for implementing RA-operators Classification of algorithms

    Sorting based methods Hash based methods

    Index based methods

    Degree of difficultness of algorithms

    One pass (when one relation can fit into main memory) Two pass (when no relation can fit in main memory, but

    again the relations are not very extremely large)

    Multi pass (when the relations are very extremely large)

    Classification of operators Tuple-at-a-time, unary operations(, )

    Full-relation, unary operations (, )

    Full-relation, binary operations (union, join,)

  • 5/28/2018 Query Exec 1

    28/38

    One pass, tuple-at-a-time

    Selection and projection

    Cost = B(R) or T(R) (if the relation is not clustered)

    Space requirement: M 1 block Principle:

    Read one block (or one tuple if the relation is not

    clustered) at a time Filter in or out the tuples of this block.

    )(RC )(RL

  • 5/28/2018 Query Exec 1

    29/38

    One pass, unary full-relation operations

    Duplicate elimination: for each tuple decide:

    seen before: ignore

    new: output

    Principle:

    It is the first time we have seen this tuple, in which case

    we copy it to the output.

    We have seen the tuple before,in which case we must

    not output this tuple.

    We need a Main Memory hash-table to be efficient.

    Requirement: MRB ))((

  • 5/28/2018 Query Exec 1

    30/38

    O bi t

  • 5/28/2018 Query Exec 1

    31/38

    One pass, binary operators

    Requirement: min(B(R),B(S)) M

    Exception: bag union Cost: B(R) + B(S)

    Assume R is larger than S.

    How to perform the operations below:

    Set union, set intersection, set difference

    Bag intersection, bag difference

    Cartesian product, natural join

    All these operators require reading the smaller of the

    relations into main memory using there a search scheme

    (like hash table, or balanced binary tree) for easy search

    and insertion.

  • 5/28/2018 Query Exec 1

    32/38

    Set Union Let Rand Sbe sets.

    We read Sinto M-1 buffers of main memory.

    All these tuples are also copied to the output.

    We then read each block of Rinto the Mthbuffer,

    one at a time.

    For each tuple tof Rwe see if tis in S, and if not,

    we copy tto output.

  • 5/28/2018 Query Exec 1

    33/38

    Set Intersection Let Rand Sbe sets or bags.

    The result will be set.

    We read Sinto M-1 buffers of main memory.

    We then read each block of Rinto the M-th buffer,

    one at a time.

    For each tuple tof Rwe see if tis in S, and if so,

    we copy tto output. At the same time we delete t

    from Sin Main Memory.

    S t Diff

  • 5/28/2018 Query Exec 1

    34/38

    Set Difference Let Rand Sbe sets.

    Since difference is not a commutative operator, we must

    distinguish between R-Sand S-Rassuming that Sis the smallerrelation.

    Read Sinto M-1 buffers of main memory.

    Then read each block of Rinto the Mthbuffer, one at a time.

    To compute R-S:

    for each tuple tof Rwe see if tis not in S, and if so, we copy

    tto output.

    To compute S-R:

    for each tuple tof Rwe see if tis is in S, we deletetfrom S in

    such a case. At the end we output those tuples of S thatremain.

  • 5/28/2018 Query Exec 1

    35/38

    Bag Intersection Let Rand Sbe bags.

    Read Sinto M-1 buffers of main memory.

    Also, associate with each tuple a count, which initially

    measures the number of times the tuple occurs in S.

    Then read each block of Rinto the M-th buffer, one at a

    time.

    For each tuple tof Rwe see if tis in S. If not we ignore it.

    Otherwise, we check to see if it appears in S, and if the

    counter is more than zero we output tand decrement the

    counter.

    B Diff

  • 5/28/2018 Query Exec 1

    36/38

    Bag Difference We read Sinto M-1 buffers of main memory.

    Also, we associate with each tuple a count, which initially measures the

    number of times the tuple occur in S.

    We then read each block of Rinto the M-th buffer, one at a time.

    To compute S-R:

    for each tuple tof Rwe see if tis is in S, we decrement its counter.

    At the end we output those tuples of S that remain with counter

    positive.

    To compute R-S:

    we may think of the counter cfor tuple tas having creasons to notoutput t.

    Now, when we process a tuple of Rwe check to see if that tuple

    appears in S. If not we output t.

    Otherwise, we check to see the counter cof t. If it is 0 we output t.

    If not, we dont output t, and we decrement c.

  • 5/28/2018 Query Exec 1

    37/38

    Product We read Sinto M-1 buffers of main memory. No special

    structure is needed.

    We then read each block of Rinto the M-th buffer, one at a

    time. And combine each tuple with all the tuples of S.

  • 5/28/2018 Query Exec 1

    38/38

    Natural Join We read Sinto M-1 buffers of main memory and build a

    search structure where the search key is the sharedattributesof R and S.

    We then read each block of Rinto the M-th buffer, one at a

    time. For each tuple tof Rwe see if tis in S, and if so, wecopy tto output.