Lecture05 Query Processing Ch23

  • Upload
    sinfeng

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

  • 8/12/2019 Lecture05 Query Processing Ch23

    1/59

    1

    Chapter 23

    Query Processing

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    2/59

    2

    Chapter 23 - Objectives

    Objectives of query processing and optimization.

    Static versus dynamic query optimization.

    How a query is decomposed and semantically

    analyzed.

    How to create a R.A.T. to represent a query.

    Rules of equivalence for RA (relation algebra)

    operations.How to apply heuristic transformation rules to

    improve efficiency of a query.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    3/59

    3

    Chapter 23 - Objectives

    Types of database statistics required to estimate

    cost of operations.

    Different strategies for implementing selection.

    How to evaluate cost and size of selection.

    Different strategies for implementing join.

    How to evaluate cost and size of join.

    Different strategies for implementing projection.How to evaluate cost and size of projection.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    4/59

    4

    Chapter 23 - Objectives

    How to evaluate the cost and size of other RAoperations.

    How pipelining can be used to improve efficiency

    of queries. Difference between materialization and

    pipelining.

    Advantages of left-deep trees.

    Approaches to finding optimal executionstrategy.

    How Oracle handles QO.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    5/59

    5

    Introduction

    In network and hierarchical DBMSs, low-levelprocedural query language is generally embeddedin high-level programming language.

    Programmers responsibility to select mostappropriate execution strategy.

    With declarative languages such as SQL, userspecifies what data is required rather than how it

    is to be retrieved. Relieves user of knowing what constitutes good

    execution strategy.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    6/59

    6

    Introduction

    Two main techniques for query optimization:

    heuristic rules that order operations in a query;

    comparing different strategies based on relativecosts, and selecting one that minimizes resourceusage.

    Disk access tends to be dominant cost in query

    processing for centralized DBMS.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    7/59

    7

    Query Processing

    Activities involved in retrieving data from the

    database.

    Aims of QP:

    transform query written in high-level language

    (e.g. SQL), into correct and efficient execution

    strategy expressed in low-level language

    (implementing RA);execute strategy to retrieve required data.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    8/59

    8

    Query Optimization

    Activity of choosing an efficient executionstrategy for processing query.

    As there are many equivalent transformations of

    same high-level query, aim of QO is to choose onethat minimizes resource usage.

    Generally, reduce total execution time of query.

    May also reduce response time of query.

    Problem computationally intractable with largenumber of relations, so strategy adopted isreduced to finding near optimum solution.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    9/59

    9

    Example 23.1 - Different Strategies

    Find all Managers who work at a London branch.

    SELECT *

    FROM Staff s, Branch b

    WHERE s.branchNo = b.branchNo AND

    (s.position = ManagerAND b.city = London);

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    10/59

    10

    Example 23.1 - Different Strategies

    Three equivalent RA queries are:

    (1) (position='Manager')(city='London') (Staff.branchNo=Branch.branchNo) (Staff X Branch)

    (2) (position='Manager')(city='London')(Staff Staff.branchNo=Branch.branchNoBranch)

    (3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo(city='London'(Branch))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    11/59

    11

    Example 23.1 - Different Strategies

    Assume:

    1000 tuples in Staff; 50 tuples in Branch;

    50 Managers; 5 London branches;

    no indexes or sort keys;

    results of any intermediate operations stored

    on disk;

    cost of the final write is ignored; tuples are accessed one at a time.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    12/59

    12

    Example 23.1 - Cost Comparison

    Cost (in disk accesses) are:

    (1) (1000 + 50) + 2*(1000 * 50) = 101 050

    (2) 2*1000 + (1000 + 50) = 3 050

    (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160

    Cartesian product and join operations muchmore expensive than selection, and third option

    significantly reduces size of relations being joinedtogether.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    13/59

    13

    Phases of Query Processing

    QP has four main phases:

    decomposition (consisting of parsing and

    validation);

    optimization;

    code generation;

    execution.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    14/59

    14

    Phases of Query Processing

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    15/59

    15

    Dynamic versus Static Optimization

    Two times when first three phases of QP can becarried out:

    dynamically every time query is run;

    statically when query is first submitted. Advantages of dynamic QO arise from fact that

    information is up to date.

    Disadvantages are that performance of query is

    affected, time may limit finding optimumstrategy.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    16/59

    16

    Dynamic versus Static Optimization

    Advantages of static QO are removal of runtime

    overhead, and more time to find optimum

    strategy.

    Disadvantages arise from fact that chosenexecution strategy may no longer be optimal

    when query is run.

    Could use a hybrid approach to overcome this.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    17/59

    17

    Query Decomposition

    Aims are to transform high-level query into RAquery and check that query is syntactically andsemantically correct.

    Typical stages are:

    analysis,

    normalization,

    semantic analysis,

    simplification,

    query restructuring.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    18/59

    18

    Analysis

    Analyze query lexically (t vng) and

    syntactically using compiler techniques.

    Verify relations and attributes exist.

    Verify operations are appropriate for object type.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    19/59

    19

    Analysis - Example

    SELECT staff_no

    FROM Staff

    WHERE position > 10;

    This query would be rejected on two grounds:

    staff_no is not defined for Staff relation

    (should be staffNo).

    Comparison >10 is incompatible with typeposition, which is variable character string.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    20/59

  • 8/12/2019 Lecture05 Query Processing Ch23

    21/59

    21

    Example 23.1 - R.A.T.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    22/59

    22

    Normalization

    Converts query into a normalized form for easier

    manipulation.

    Predicate can be converted into one of two forms:

    Conjunctive normal form:

    (position = 'Manager' salary > 20000) (branchNo = 'B003')

    Disjunctive normal form:

    (position = 'Manager' branchNo = 'B003' ) (salary > 20000 branchNo = 'B003')

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    23/59

    23

    Semantic Analysis

    Rejects normalized queries that are incorrectlyformulated or contradictory.

    Query is incorrectly formulated if componentsdo not contribute to generation of result.

    Query is contradictory if its predicate cannot besatisfied by any tuple.

    Algorithms to determine correctness exist only

    for queries that do not contain disjunction andnegation.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    24/59

    24

    Semantic Analysis

    For these queries, could construct:

    A relation connection graph.

    Normalized attribute connection graph.

    Relation connection graph

    Create node for each relation and node for

    result. Create edges between two nodes that

    represent a join, and edges between nodes thatrepresent projection.

    If not connected, query is incorrectly formulated.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    25/59

    25

    Simplification

    Detects redundant qualifications,

    eliminates common sub-expressions,

    transforms query to semantically equivalent

    but more easily and efficiently computed form. Typically, access restrictions, view definitions,

    and integrity constraints are considered.

    Assuming user has appropriate access privileges,

    first apply well-known idempotency rules ofboolean algebra.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    26/59

    26

    Transformation Rules for RA Operations

    Conjunctive Selection operations can cascade into

    individual Selection operations (and vice versa).

    pqr(R) = p(q(r(R))) Sometimes referred to as cascade of Selection.

    branchNo='B003' salary>15000(Staff) =branchNo='B003'(salary>15000(Staff))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    27/59

    27

    Transformation Rules for RA Operations

    Commutativity of Selection.

    p(q(R)) = q(p(R))

    For example:

    branchNo='B003'(salary>15000(Staff)) =salary>15000(branchNo='B003'(Staff))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    28/59

    28

    Transformation Rules for RA Operations

    In a sequence of Projection operations, only the

    last in the sequence is required.

    LM N(R) = L(R)

    For example:

    lName branchNo, lName(Staff) = lName(Staff)

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    29/59

    29

    Transformation Rules for RA Operations

    Commutativity of Selection and Projection.

    If predicate p involves only attributes in projection list,

    Selection and Projection operations commute:

    Ai, , Am(p(R)) = p( Ai, , Am(R))where p{A1, A2, , Am}

    For example:

    fName, lName(lName='Beech'(Staff)) =lName='Beech'( fName,lName(Staff))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    30/59

    30

    Transformation Rules for RA Operations

    Commutativity of Theta join (and Cartesianproduct).

    R pS = S pR

    R X S = S X R

    Rule also applies to Equijoin and Natural join.For example:

    Staff staff.branchNo=branch.branchNoBranch =

    Branch staff.branchNo=branch.branchNo Staff

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    31/59

    31

    Transformation Rules for RA Operations

    Commutativity of Selection and Theta join (orCartesian product).

    If selection predicate involves only attributes ofone of join relations, Selection and Join (orCartesian product) operations commute:

    p(R rS) = (p(R)) rSp(R X S) = (p(R)) X S

    where p{A1, A2, , An}

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    32/59

    32

    Transformation Rules for RA Operations

    If selection predicate is conjunctive predicate

    having form (p q), where p only involvesattributes of R, and q only attributes of S,

    Selection and Theta join operations commute as:

    p q(R rS) = (p(R)) r(q(S))p q(R X S) = (p(R)) X (q(S))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    33/59

    33

    Transformation Rules for RA Operations

    For example:

    position='Manager' city='London'(StaffStaff.branchNo=Branch.branchNoBranch) =

    (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo(city='London'(Branch))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    34/59

    34

    Transformation Rules for RA Operations

    Commutativity of Projection and Theta join (orCartesian product).

    If projection list is of form L = L1L

    2, where L

    1

    only has attributes of R, and L2 only hasattributes of S, provided join condition onlycontains attributes of L, Projection and Theta

    join commute:

    L1L2(R rS) = ( L1(R)) r( L2(S))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    35/59

    35

    Transformation Rules for RA Operations

    If join condition contains additional attributes

    not in L (M = M1 M2 where M1 only hasattributes of R, and M2only has attributes of S),

    a final projection operation is required:

    L1L2(R rS) = L1L2( ( L1M1(R)) r( L2M2(S)))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    36/59

    36

    Transformation Rules for RA Operations

    For example:

    position,city,branchNo(Staff Staff.branchNo=Branch.branchNoBranch)

    =

    (position, branchNo

    (Staff))Staff.branchNo=Branch.branchNo

    (

    city, branchNo(Branch))

    and using the latter rule:

    position, city

    (StaffStaff.branchNo=Branch.branchNo

    Branch) =

    position, city(( position, branchNo(Staff))Staff.branchNo=Branch.branchNo( city, branchNo(Branch)))

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    37/59

    37

    Transformation Rules for RA Operations

    Commutativity of Union and Intersection (but

    not set difference).

    R S = S RR S = S R

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    38/59

    38

    Transformation Rules for RA Operations

    Commutativity of Selection and set operations

    (Union, Intersection, and Set difference).

    p(R S) = p(S) p(R)p(R S) = p(S) p(R)p(R - S) = p(S) - p(R)

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    39/59

    39

    Transformation Rules for RA Operations

    Commutativity of Projection and Union.

    L(R S) = L(S) L(R)Associativity of Union and Intersection (but not

    Set difference).

    (R S) T = S (R T)(R S) T = S (R T)

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    40/59

    40

    Transformation Rules for RA Operations

    Associativity of Theta join (and Cartesian product).

    Cartesian product and Natural join are always

    associative:

    (R S) T = R (S T)

    (R X S) X T = R X (S X T)

    If join condition q involves attributes only from S

    and T, then Theta join is associative:(R p S) q rT = R p r (S q T)

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    41/59

    41

    Transformation Rules for RA Operations

    For example:

    (Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

    ownerNo=Owner.ownerNo

    staff.lName=Owner.lName

    Owner =

    Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName(PropertyForRent ownerNoOwner)

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    42/59

    42

    Example 23.3 Use of Transformation Rules

    For prospective renters of flats, find propertiesthat match requirements and owned by CO93.

    SELECT p.propertyNo, p.street

    FROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = FlatAND

    c.clientNo = v.clientNo AND

    v.propertyNo = p.propertyNo AND

    c.maxRent >= p.rent ANDc.prefType = p.type AND

    p.ownerNo = CO93;

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    43/59

    43

    Example 23.3 Use of Transformation Rules

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    44/59

    44

    Example 23.3 Use of Transformation Rules

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    45/59

    45

    Example 23.3 Use of Transformation Rules

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    46/59

    46

    Heuristical Processing Strategies

    Perform Selection operations as early as possible.

    Keep predicates on same relation together.

    Combine Cartesian product with subsequent

    Selection whose predicate represents joincondition into a Join operation.

    Use associativity of binary operations to

    rearrange leaf nodes so leaf nodes with mostrestrictive Selection operations executed first.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    47/59

    47

    Heuristical Processing Strategies

    Perform Projection as early as possible.

    Keep projection attributes on same relation together.

    Compute common expressions once.

    If common expression appears more than once, and

    result not too large, store result and reuse it when

    required.

    Useful when querying views, as same expression is used

    to construct view each time.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    48/59

    48

    Cost Estimation for RA Operations

    Many different ways of implementing RAoperations.

    Aim of QO is to choose most efficient one.

    Use formulae that estimate costs for a number ofoptions, and select one with lowest cost.

    Consider only cost of disk access, which is usuallydominant cost in QP.

    Many estimates are based on cardinality of therelation, so need to be able to estimate this.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    49/59

    49

    Database Statistics

    Success of estimation depends on amount and

    currency of statistical information DBMS holds.

    Keeping statistics current can be problematic.

    If statistics updated every time tuple is changed,this would impact performance.

    DBMS could update statistics on a periodic basis,

    for example nightly, or whenever the system is

    idle.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    50/59

    50

    Query Optimization in Oracle

    Oracle supports two approaches to queryoptimization: rule-based and cost-based.

    Rule-based

    15 rules, ranked in order of efficiency. Particularaccess path for a table only chosen if statementcontains a predicate or other construct thatmakes that access path available.

    Score assigned to each execution strategy usingthese rankings and strategy with best (lowest)score selected.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    51/59

    51

    QO in OracleRule-Based

    When 2 strategies have same score, tie-breakresolved by making decision based on order in

    which tables occur in the SQL statement.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    52/59

    52

    QO in OracleRule-based: Example

    SELECT propertyNoFROM PropertyForRent

    WHERE rooms > 7 AND city = London

    Single-column access path using index on city from

    WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from

    WHERE condition (rooms > 7). Rank 11.

    Full table scan - rank 15.

    Although there is index on propertyNo, column does notappear in WHERE clause and so is not considered byoptimizer.

    Based on these paths, rule-based optimizer will choose touse index based on city column.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    53/59

    53

    QO in OracleCost-Based

    To improve QO, Oracle introduced cost-basedoptimizer in Oracle 7, which selects strategy thatrequires minimal resource use necessary toprocess all rows accessed by query (avoiding

    above tie-break anomaly). User can select whether minimal resource usage

    is based on throughputor based on response time,by setting the OPTIMIZER_MODE initialization

    parameter. Cost-based optimizer also takes into

    consideration hints that the user may provide.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    54/59

    54

    QO in OracleStatistics

    Cost-based optimizer depends on statistics for alltables, clusters, and indexes accessed by query.

    Users responsibility to generate these statisticsand keep them current.

    Package DBMS_STATS can be used to generateand manage statistics.

    Whenever possible, Oracle uses a parallel method

    to gather statistics, although index statistics arecollected serially.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    55/59

    55

    QO in OracleHistograms

    Previously made assumption that data values

    within columns of a table are uniformly

    distributed.

    Histogram of values and their relativefrequencies gives optimizer improved selectivity

    estimates in presence of non-uniform

    distribution.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    56/59

    56

    QO in OracleHistograms

    (a) uniform distribution of rooms; (b) actual non-uniformdistribution.

    (a) can be stored compactly as low value (1) and high value

    (10), and as total count of all frequencies (in this case, 100).

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    57/59

    57

    QO in OracleHistograms

    Histogram is data structure that can improveestimates of number of tuples in result.

    Two types of histogram:

    width-balanced histogram, which divides data into a

    fixed number of equal-width ranges (called buckets)each containing count of number of values fallingwithin that bucket;

    height-balanced histogram, which places

    approximately same number of values in each bucketso that end points of each bucket are determined byhow many values are in that bucket.

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    58/59

    58

    QO in OracleHistograms

    (a) width-balanced for rooms with 5 buckets. Each bucket

    of equal width with 2 values (1-2, 3-4, etc.)

    (b) height-balanced height of each column is 20 (100/5).

    Pearson Education 2009

  • 8/12/2019 Lecture05 Query Processing Ch23

    59/59

    9

    QO in OracleViewing Execution Plan