Upload
taariq
View
45
Download
2
Embed Size (px)
DESCRIPTION
Avoiding Sorting and Grouping In Processing Queries Sahak Maloyan. Motivation Simple Example Order Properties Grouping followed by ordering Order Property Optimization Performance Results Conclusion. Outline. Previous presentation: Fundamental Techniques for Order Optimization - PowerPoint PPT Presentation
Citation preview
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Avoiding Sorting and Grouping In Processing
Queries
Sahak Maloyan
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Outline
• Motivation• Simple Example• Order Properties• Grouping followed by ordering• Order Property Optimization• Performance Results• Conclusion
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Motivation
• Previous presentation:• Fundamental Techniques for Order Optimization• Using FDs and selection predicates • Determining order propagation from input to output• Infer from ordering
• Current presentation:• Aside from orderings, we also infer how relations are
grouped (i.e., how records in relations are clustered according to value of certain attributes)
• Infer from grouping • Infer from secondary ordering
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Motivation(cont.)
• Inferred orderings – Make it possible to avoid sorting when preprocessing ORDER
BY clauses of SQL query
• Inferred groupings– Avoid sorting or hashing prior to computing aggregates for
GROUP BY clauses – Reduce the cost of projection with duplicate elimination– Complete projection and duplicate elimination in a single pass– Reduce the cost of evaluating selection queries in the form
σA=k(R) in the absence of indexes or an ordering on A
• Inference of secondary ordering and grouping – Avoid unnecessary sorting or grouping over multiple attributes – Infer new primary orderings or groupings (example follows)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Simple Example
• Benefits of inferring grouping and secondary ordering
TPC-H Query
SELECT c_custkey, COUNT (*)FROM Customer, SupplierWHERE c_nationkey = s_nationkeyGROUPBY c_custkey
How many suppliers could supply each costumer directly without having to go through customs
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Simple Example (cont.)
groupc_custkey, count(*)
merge-joinc_nationkey = s_nationkey
sortc_nationkey
table scansupplier
table scancustomer
sorts_nationkey
sortc_custkey
Postgres QEP of the Query
Postgres Plan first sorts the join result on the grouping attribute c_custkey so as to be able to aggregate over groups in a single pass.But one-pass aggregation
requires data only to be grouped and not sorted!
sort-merge join result is sorted (and hence grouped) on c_nationkey; the output tuples in the same group with respect to c_nationkey, are themselves grouped on the key of outer relation (c_custkey)
“c_nationkey G→c_custkeyG “=>no sort
TPC-H Query
SELECT c_custkey, COUNT (*)FROM Customer, SupplierWHERE c_nationkey = s_nationkeyGROUPBY c_custkey
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• order properties have the form:
• each Ai is an attribute, each αi either specifies an ordering (αi = O) or a grouping (αi =G)
• A1α1 primary ordering or grouping and A2
α2 secondary
• Ordering properties are formalized with an algebra of constructors, following the signatures given below:
Order Properties
empty ordering
combination of orderings
basic orderings:order or group
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Grouping followed by ordering
• Suppose that R=(A,B) consists of 10 tuples, t1,…,t10, and its physical representation satisfies the order property, Ao → BG. This situation is illustrated on the next slide
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Grouping followed by ordering (cont.)
A=1A=3
A=2
t3
t1 t2
t7 t6
t5 t4
B=1
B=2
B=1 B=2
t9 t10
t8< <
B=3 B=2
B=1
The primary ordering (AO) says that the group of tuples with A=1 precedes the group of tuples with A=2 which precedes the group with A=3
The secondary ordering (BG) says that within each group of tuples with like values of A, tuples are clustered together if they have the same value for B
An illustration of AO → BG
t1 can precede t2 or t2 can precede t1 but
the must be adjacent
Two Example permutations that satisfies the order property :
t2, t1, t3, t10, t8, t9, t6, t7, t4, t5
t1, t2 , t3 , t9, t8, t10, t4 , t5, t6, t7
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Computing with Order Properties (cont.)
• The general properties have the form:
• Shorthand:
• Also, given and
the shorthand: “o1→o2“ (concatenation of OP) denotes:
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Properties (cont.)
• for any order property that holds of a physical relation, all prefixes of that order property also hold of R
• an ordering on any attribute implies a grouping on that attribute
• If X functionally determines B, and an order property that includes all attributes in X (ordered or grouped) appearing before Bα, then Bα is superfluous.
• Identities
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Properties (cont.)
• Identities (cont.)
• special case of identity #3, covering the case where X consists of a single attribute
• the grouping of an attribute that is functionally determined by the attribute that follows it in the order property is superfluous
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Inference
• Using the algebra of order properties and their formal definitions, we can derive inference rules that state how order properties propagate through relational operators, e.g., joins:
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• The data structures for all plan nodes in postgres include the following fields:• inp1,… inpn: the fields contained in all input tuples to
the node• left: the left subtree of the node (set to Null for leaf
nodes and Append)• right: the right subtree of the node (set to Null
for leaf nodes, unary operators and Append).
Order Property Optimization
• Postgres Plan Operators Summarized
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• Postgres Plan Operators Summarized(cont.)• Additional operator-specific fields provided by
Postgres and used by our refinement algorithm
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• Group performs two passes over its input:– insert Null values between pairs of consecutive
tuples with different values for attributes, att1, …,attk, – apply functions Fk+1,…, Fn to the collection of values
of attributes attk+1,…,attn respectively, for each set of tuples separated by Nulls.
1. Hash: builds a hash table over its input using a predetermined hash function over attribute att.
• Postgres Plan Operators Summarized (cont.)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• HJoin: performs a (non-order-preserving) simple hash equijoin (att1 = att2) with the relation produced by left as the probe relation, and the relation produced by right as the build relation.
• Merge: performs a merge equijoin (att1 = att2) with the relation produced by left as the outer relation, and the relation produced by right as the inner relation.
• NOP: has been added as a dummy plan operator that is temporarily made the root of a Postgres plan prior to its refinement.
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• Input: query plan tree generated by Postgres• Output: an equivalent plan tree with unnecessary
Sort operators (used either to order or group) removed
• Requires: 4 new attributes associated with every node in a query plan tree
• A Plan Refinement Algorithm
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• keys: a set of attribute sets that are guaranteed to be keys of inputs to n
• fds: a set of functional dependencies (attribute sets → attribute) that are guaranteed to hold of inputs to n
• req: a single order property that is required to hold of inputs either to n or some ancestor node of n for that node to execute
• sat: a set of order properties that are guaranteedto be satisfied by outputs of n
Order Property Optimization
• A Plan Refinement Algorithm(cont.)
• New Attributes
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• Idea:– decorate the input plan with these new attributes– remove any Sort operator, whose child node
produces a result that is guaranteed to satisfy an order property required by its parent node
• Accomplished in 3 passes over the input plan
• A Plan Refinement Algorithm (cont.)
Order Property Optimization
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• Refinement of the query plan• A Plan Refinement Algorithm (cont.)
merge-joinc_nationkey = s_nationkey
sortc_nationkey
table scansupplier
table scancustomer
sorts_nationkey
sortc_custkey
NOP
groupc_custkey, count(*)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
• A Plan Refinement Algorithm (cont.)• Resulting query plan with Sort removed:
merge-joinc_nationkey = s_nationkey
sortc_nationkey
table scansupplier
table scancustomer
sorts_nationkey
groupc_custkey, count(*)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
Pass 1: Functional Dependencies and Keys– A bottom-up pass, FDs and keys are propagated upwards when
inferred to hold of intermediate query result
Pass 2: Required Order Properties– Top-down pass to propagate required order properties (req)
downwards from the root of the tree– Pseudocode of this pass given in SetReq (next slide)– New required order properties are generated by:
• NOP: if its child is Sort, i.e., original query includes order by• Group and Unique (whose input needs to be grouped)• Join operators (propagate 1 order from above into 2 below)
• All other nodes pass the required order properties they inherit from parent nodes to their child nodes, except for Hash and Append which propagate the empty order property to their child nodes
• A Plan Refinement Algorithm (cont.)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Pass 3:Sort Elimination– A bottom-up pass of the query plan tree that
determines what order properties are guaranteed to be satisfied by outputs of each node (sat), and that concurrently removes any Sort operator, n for which n.left.sat n.req
– Algorithm: InferSat (next slides)
∋
Order Property Optimization
•A Plan Refinement Algorithm (cont.)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
•A Plan Refinement Algorithm (cont.)
•InferSat
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Order Property Optimization
•A Plan Refinement Algorithm (cont.)
•InferSat (cont.)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Example:TPC-D (now TPC-H) Query 3
TPC-D Query 3
select l.orderkey,sum (l .extendedprice*( 1- l.discount)) as rev,o.orderdate, o.shippriority
from customer, order, lineitemwhere o.orderkey = l.orderkeyand c.custkey = o.custkeyand c.mktsegment =’building’and o.orderdate < date(‘1998-11-30’)and l.shipdate > date(‘1998-11-30’)group by l.orderkey, o.orderdate, o.shippriorityorder by rev desc, o.orderdate
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• Previous presentation:– optimized plan outperformed the original plan
by a factor of 2
• Now:– Further improvements due to reasoning about
groupings and secondary orderings
Example:TPC-D(now TPC-H) Query 3
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
NLJ R=> O o_orderkeyG(U)
Identitiy#5 => O o_orderkeyG(T)
Identitiy#4 =>Oo_custkeyG→ O o_orderkeyG(T)
MJ Rule =>Oc_custkeyG→c_custkeyG→o_custkeyG→
o_orderkeyG(T)
and c_custkey = o_custkey =>
Oo_custkeyG→o_custkeyG→o_custkeyG→
o_orderkeyG(T)
group byo_orderkey
merge-joinc_custkey = o_custkey
nested-loopso_orderkey = l_orderkey
Index scanlineitem
sortc_custkey
table scanorder
table scancustomer
sorto_custkey
sorto_orderkey
sortrev, o_orderdate
Example:TPC-D(now TPC-H) Query 3
Oc_custkeyo(R)=> Oc_custkeyG(R) Oo_custkeyo(S)=> Oo_custkeyG(S)
Identitiy#5 => Oc_custkeyG→ o_orderkeyG(S)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
• TPC-D (now TPC-H) Results• Database:
• Customer table: 150,000 rows• Supplier table: 10,000 rows• Order table: 1,500,000 rows• LineItem table: 6,000,000 rows
• PC:• 1 GHz Pentium III• Linux, with 512 MB RAM, 120 GB HDD
Performance Results
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Performance Results
groupc_custkey, count(*)
merge-joinc_nationkey = s_nationkey
sortc_nationkey
table scansupplier
table scancustomer
sorts_nationkey
sortc_custkey
Experiment #1 our example
Postgres Plan Refined Ratio
6384.9 sec 487.9 sec 13.08
N.B.: Merge join result isHUGE (60 Mio rows)
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Performance Results
Experiment #2 TPC-H Query 3
group byo_orderkey
merge-joinc_custkey = o_custkey
nested-loopso_orderkey = l_orderkey
Index scanlineitem
sortc_custkey
table scanorder
table scancustomer
sorto_custkey
sorto_orderkey
sortrev, o_orderdate
Postgres Plan Refined Ratio
126.8 sec 2729.9 sec 0.05
Same value of o_orderkey were consecutive tuples thereby increased likelihood of finding joining tuples from lineitem in the cache
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Performance Results
Experiment #2 TPC-H Query 3
With table scan on lineitemgroup by
o_orderkey
merge-joinc_custkey = o_custkey
nested-loopso_orderkey = l_orderkey
Table scanlineitem
sortc_custkey
table scanorder
table scancustomer
sorto_custkey
sorto_orderkey
sortrev, o_orderdate
Postgres Plan Refined Ratio
121.4 sec 113.3 sec 1.07
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Cost of additional optimization
How much do we pay for plan refinement?
We pay most, when it actually pays off!(queries Q1, Q5, Q10: no refinement)
Zur Anzeige wird der QuickTime™ Dekompressor „TIFF (LZW)“
benötigt.
University of KonstanzAdvances in Database Query Processing Sahak Maloyan
Conclusion
• Formal approach to order optimization that integrates both orderings and groupings within the same comprehensive framework
• Also considered secondary orderings and groupings
• By inferring secondary orderings and groupings, it is possible to avoid unnecessary sorting or grouping over multiple attributes
• Use secondary orderings known of an operator's input to infer primary orderings of its output