36
University of Konstanz Advances in Database Query Processing Sahak Maloyan Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan

Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan

  • Upload
    taariq

  • View
    45

  • Download
    2

Embed Size (px)

DESCRIPTION

Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan. Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion. Outline. Previous presentation: Fundamental Techniques for Order Optimization - PowerPoint PPT Presentation

Citation preview

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Avoiding Sorting and Grouping In Processing

Queries

Sahak Maloyan

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Outline

• Motivation• Simple Example• Order Properties• Grouping followed by ordering• Order Property Optimization• Performance Results• Conclusion

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Motivation

• Previous presentation:• Fundamental Techniques for Order Optimization• Using FDs and selection predicates • Determining order propagation from input to output• Infer from ordering

• Current presentation:• Aside from orderings, we also infer how relations are

grouped (i.e., how records in relations are clustered according to value of certain attributes)

• Infer from grouping • Infer from secondary ordering

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Motivation(cont.)

• Inferred orderings – Make it possible to avoid sorting when preprocessing ORDER

BY clauses of SQL query

• Inferred groupings– Avoid sorting or hashing prior to computing aggregates for

GROUP BY clauses – Reduce the cost of projection with duplicate elimination– Complete projection and duplicate elimination in a single pass– Reduce the cost of evaluating selection queries in the form

σA=k(R) in the absence of indexes or an ordering on A

• Inference of secondary ordering and grouping – Avoid unnecessary sorting or grouping over multiple attributes – Infer new primary orderings or groupings (example follows)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Simple Example

• Benefits of inferring grouping and secondary ordering

TPC-H Query

SELECT c_custkey, COUNT (*)FROM Customer, SupplierWHERE c_nationkey = s_nationkeyGROUPBY c_custkey

How many suppliers could supply each costumer directly without having to go through customs

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Simple Example (cont.)

groupc_custkey, count(*)

merge-joinc_nationkey = s_nationkey

sortc_nationkey

table scansupplier

table scancustomer

sorts_nationkey

sortc_custkey

Postgres QEP of the Query

Postgres Plan first sorts the join result on the grouping attribute c_custkey so as to be able to aggregate over groups in a single pass.But one-pass aggregation

requires data only to be grouped and not sorted!

sort-merge join result is sorted (and hence grouped) on c_nationkey; the output tuples in the same group with respect to c_nationkey, are themselves grouped on the key of outer relation (c_custkey)

“c_nationkey G→c_custkeyG “=>no sort

TPC-H Query

SELECT c_custkey, COUNT (*)FROM Customer, SupplierWHERE c_nationkey = s_nationkeyGROUPBY c_custkey

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• order properties have the form:

• each Ai is an attribute, each αi either specifies an ordering (αi = O) or a grouping (αi =G)

• A1α1 primary ordering or grouping and A2

α2 secondary

• Ordering properties are formalized with an algebra of constructors, following the signatures given below:

Order Properties

empty ordering

combination of orderings

basic orderings:order or group

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Grouping followed by ordering

• Suppose that R=(A,B) consists of 10 tuples, t1,…,t10, and its physical representation satisfies the order property, Ao → BG. This situation is illustrated on the next slide

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Grouping followed by ordering (cont.)

A=1A=3

A=2

t3

t1 t2

t7 t6

t5 t4

B=1

B=2

B=1 B=2

t9 t10

t8< <

B=3 B=2

B=1

The primary ordering (AO) says that the group of tuples with A=1 precedes the group of tuples with A=2 which precedes the group with A=3

The secondary ordering (BG) says that within each group of tuples with like values of A, tuples are clustered together if they have the same value for B

An illustration of AO → BG

t1 can precede t2 or t2 can precede t1 but

the must be adjacent

Two Example permutations that satisfies the order property :

t2, t1, t3, t10, t8, t9, t6, t7, t4, t5

t1, t2 , t3 , t9, t8, t10, t4 , t5, t6, t7

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Computing with Order Properties (cont.)

• The general properties have the form:

• Shorthand:

• Also, given and

the shorthand: “o1→o2“ (concatenation of OP) denotes:

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Properties (cont.)

• for any order property that holds of a physical relation, all prefixes of that order property also hold of R

• an ordering on any attribute implies a grouping on that attribute

• If X functionally determines B, and an order property that includes all attributes in X (ordered or grouped) appearing before Bα, then Bα is superfluous.

• Identities

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Properties (cont.)

• Identities (cont.)

• special case of identity #3, covering the case where X consists of a single attribute

• the grouping of an attribute that is functionally determined by the attribute that follows it in the order property is superfluous

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Inference

• Using the algebra of order properties and their formal definitions, we can derive inference rules that state how order properties propagate through relational operators, e.g., joins:

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• The data structures for all plan nodes in postgres include the following fields:• inp1,… inpn: the fields contained in all input tuples to

the node• left: the left subtree of the node (set to Null for leaf

nodes and Append)• right: the right subtree of the node (set to Null

for leaf nodes, unary operators and Append).

Order Property Optimization

• Postgres Plan Operators Summarized

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• Postgres Plan Operators Summarized(cont.)• Additional operator-specific fields provided by

Postgres and used by our refinement algorithm

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• Group performs two passes over its input:– insert Null values between pairs of consecutive

tuples with different values for attributes, att1, …,attk, – apply functions Fk+1,…, Fn to the collection of values

of attributes attk+1,…,attn respectively, for each set of tuples separated by Nulls.

1. Hash: builds a hash table over its input using a predetermined hash function over attribute att.

• Postgres Plan Operators Summarized (cont.)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• HJoin: performs a (non-order-preserving) simple hash equijoin (att1 = att2) with the relation produced by left as the probe relation, and the relation produced by right as the build relation.

• Merge: performs a merge equijoin (att1 = att2) with the relation produced by left as the outer relation, and the relation produced by right as the inner relation.

• NOP: has been added as a dummy plan operator that is temporarily made the root of a Postgres plan prior to its refinement.

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• Input: query plan tree generated by Postgres• Output: an equivalent plan tree with unnecessary

Sort operators (used either to order or group) removed

• Requires: 4 new attributes associated with every node in a query plan tree

• A Plan Refinement Algorithm

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• keys: a set of attribute sets that are guaranteed to be keys of inputs to n

• fds: a set of functional dependencies (attribute sets → attribute) that are guaranteed to hold of inputs to n

• req: a single order property that is required to hold of inputs either to n or some ancestor node of n for that node to execute

• sat: a set of order properties that are guaranteedto be satisfied by outputs of n

Order Property Optimization

• A Plan Refinement Algorithm(cont.)

• New Attributes

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• Idea:– decorate the input plan with these new attributes– remove any Sort operator, whose child node

produces a result that is guaranteed to satisfy an order property required by its parent node

• Accomplished in 3 passes over the input plan

• A Plan Refinement Algorithm (cont.)

Order Property Optimization

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• Refinement of the query plan• A Plan Refinement Algorithm (cont.)

merge-joinc_nationkey = s_nationkey

sortc_nationkey

table scansupplier

table scancustomer

sorts_nationkey

sortc_custkey

NOP

groupc_custkey, count(*)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

• A Plan Refinement Algorithm (cont.)• Resulting query plan with Sort removed:

merge-joinc_nationkey = s_nationkey

sortc_nationkey

table scansupplier

table scancustomer

sorts_nationkey

groupc_custkey, count(*)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

Pass 1: Functional Dependencies and Keys– A bottom-up pass, FDs and keys are propagated upwards when

inferred to hold of intermediate query result

Pass 2: Required Order Properties– Top-down pass to propagate required order properties (req)

downwards from the root of the tree– Pseudocode of this pass given in SetReq (next slide)– New required order properties are generated by:

• NOP: if its child is Sort, i.e., original query includes order by• Group and Unique (whose input needs to be grouped)• Join operators (propagate 1 order from above into 2 below)

• All other nodes pass the required order properties they inherit from parent nodes to their child nodes, except for Hash and Append which propagate the empty order property to their child nodes

• A Plan Refinement Algorithm (cont.)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Pass 3:Sort Elimination– A bottom-up pass of the query plan tree that

determines what order properties are guaranteed to be satisfied by outputs of each node (sat), and that concurrently removes any Sort operator, n for which n.left.sat n.req

– Algorithm: InferSat (next slides)

Order Property Optimization

•A Plan Refinement Algorithm (cont.)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

•A Plan Refinement Algorithm (cont.)

•InferSat

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Order Property Optimization

•A Plan Refinement Algorithm (cont.)

•InferSat (cont.)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Example:TPC-D (now TPC-H) Query 3

TPC-D Query 3

select l.orderkey,sum (l .extendedprice*( 1- l.discount)) as rev,o.orderdate, o.shippriority

from customer, order, lineitemwhere o.orderkey = l.orderkeyand c.custkey = o.custkeyand c.mktsegment =’building’and o.orderdate < date(‘1998-11-30’)and l.shipdate > date(‘1998-11-30’)group by l.orderkey, o.orderdate, o.shippriorityorder by rev desc, o.orderdate

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• Previous presentation:– optimized plan outperformed the original plan

by a factor of 2

• Now:– Further improvements due to reasoning about

groupings and secondary orderings

Example:TPC-D(now TPC-H) Query 3

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

NLJ R=> O o_orderkeyG(U)

Identitiy#5 => O o_orderkeyG(T)

Identitiy#4 =>Oo_custkeyG→ O o_orderkeyG(T)

MJ Rule =>Oc_custkeyG→c_custkeyG→o_custkeyG→

o_orderkeyG(T)

and c_custkey = o_custkey =>

Oo_custkeyG→o_custkeyG→o_custkeyG→

o_orderkeyG(T)

group byo_orderkey

merge-joinc_custkey = o_custkey

nested-loopso_orderkey = l_orderkey

Index scanlineitem

sortc_custkey

table scanorder

table scancustomer

sorto_custkey

sorto_orderkey

sortrev, o_orderdate

Example:TPC-D(now TPC-H) Query 3

Oc_custkeyo(R)=> Oc_custkeyG(R) Oo_custkeyo(S)=> Oo_custkeyG(S)

Identitiy#5 => Oc_custkeyG→ o_orderkeyG(S)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

• TPC-D (now TPC-H) Results• Database:

• Customer table: 150,000 rows• Supplier table: 10,000 rows• Order table: 1,500,000 rows• LineItem table: 6,000,000 rows

• PC:• 1 GHz Pentium III• Linux, with 512 MB RAM, 120 GB HDD

Performance Results

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Performance Results

groupc_custkey, count(*)

merge-joinc_nationkey = s_nationkey

sortc_nationkey

table scansupplier

table scancustomer

sorts_nationkey

sortc_custkey

Experiment #1 our example

Postgres Plan Refined Ratio

6384.9 sec 487.9 sec 13.08

N.B.: Merge join result isHUGE (60 Mio rows)

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Performance Results

Experiment #2 TPC-H Query 3

group byo_orderkey

merge-joinc_custkey = o_custkey

nested-loopso_orderkey = l_orderkey

Index scanlineitem

sortc_custkey

table scanorder

table scancustomer

sorto_custkey

sorto_orderkey

sortrev, o_orderdate

Postgres Plan Refined Ratio

126.8 sec 2729.9 sec 0.05

Same value of o_orderkey were consecutive tuples thereby increased likelihood of finding joining tuples from lineitem in the cache

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Performance Results

Experiment #2 TPC-H Query 3

With table scan on lineitemgroup by

o_orderkey

merge-joinc_custkey = o_custkey

nested-loopso_orderkey = l_orderkey

Table scanlineitem

sortc_custkey

table scanorder

table scancustomer

sorto_custkey

sorto_orderkey

sortrev, o_orderdate

Postgres Plan Refined Ratio

121.4 sec 113.3 sec 1.07

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Cost of additional optimization

How much do we pay for plan refinement?

We pay most, when it actually pays off!(queries Q1, Q5, Q10: no refinement)

Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“

benötigt.

University of KonstanzAdvances in Database Query Processing Sahak Maloyan

Conclusion

• Formal approach to order optimization that integrates both orderings and groupings within the same comprehensive framework

• Also considered secondary orderings and groupings

• By inferring secondary orderings and groupings, it is possible to avoid unnecessary sorting or grouping over multiple attributes

• Use secondary orderings known of an operator's input to infer primary orderings of its output