Download ppt - 5/10/20151 PSU’s CS 587 14. Evaluating Relational Operators Table Statistics Operators and example schema Selection Projection Equijoins General

04/18/23 1PSU’s CS 587

14. Evaluating Relational Operators

Table Statistics Operators and example schema Selection Projection Equijoins General Joins Set Operators Buffering Intergalactic standard reference

G. Graefe, "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, 25(2) (1993), pp. 73-170

04/18/23 2PSU’s CS 587

Learning Objectives In a typical major DBMS, what statistics

are automatically collected and when? Given collected statistics, estimate a

predicate’s output size/selectivity. For each relational operator, describe the

major algorithms, their optimizations, their pros and cons, and their costs Selection Projection Equijoins General Joins Set Operators

04/18/23 3PSU’s CS 587

Motivation/Review

Assume Sailors has an index on age. Does the optimal plan for this query use

the index?SELECT *FROM Sailors SWHERE S.age < 31 Moral: In order to choose the optimal

plan we need to know the selectivity of the predicate/size of the output.

04/18/23 4PSU’s CS 587

Collecting Statistics What are statistics

Table sizes, index sizes, ranges of values, etc. Where are statistics kept? In the system catalog. Why collect statistics?

Statistics are needed to determine selectivity of predicates and sizes of outputs of operators.

Data in the previous bullet is needed to calculate costs of plans.

Data in the previous bullet is needed by the optimizer to find the optimal plan.

How often are statistics collected? Typically done when 10% of the data has been updated.

• Can be overidden manually• UPDATE STATISTICS, ANALYZE

Typically done by sampling, if table is large Which tables/columns are monitored?

Typically all tables/columns are monitored

04/18/23 5PSU’s CS 587

Sailors Example What statistics

would you collect to find the number of rows satisfying Age < 31? Rank =4 ?

In Intro DB we assumed data was uniformly distributed.

30/30.2/30.4/30.6/30.8/30.9/31/32/33/34.5

35/36/36.6/37.8/39/39.5/40/41/43/44

/46/48/50/51/52/54/56/58/59/60

04/18/23 6PSU’s CS 587

Histograms Histograms are more accurate than the uniformity

assumption. Equiwidth Histogram:

Estimate number of rows/selectivity for the predicate “age< 31”:

Equidepth Histogram:

Estimate number of rows/selectivity for the predicate “age< 31”:

Actual value: 6. Moral: Equidepth histograms are more accurate.

#valsrange 30 33 36 39 42 45 48 51 54 57

9 3 3 3 2 1 2 2 2 3

#valsrange 30 30.430.9 33 36 39 41 46 51 56

3 3 3 3 3 3 3 3 3 3

04/18/23 7PSU’s CS 587

Real DBMSs Store Histograms and More

The next slide is the result of:SELECT attname, n_distinct, most_common_vals,

most_common_freqs, histogram_bounds

FROM pg_stats WHERE tablename = 'sailors';

n_distinct: positive means number of distinct values, negative means distinct values over number of rows, so -1 means unique

most_common_vals: NULL if no values are more common than others

histogram: equidepth, for values except most common

04/18/23 8PSU’s CS 587

Example Statistics: Postgres

EXPLAIN SELECT * from sailors where age < 31;

QUERY PLAN Seq Scan on sailors (cost=0.00..1.38 rows=6 width=21)Filter: (age < 31::double precision)

04/18/23 9PSU’s CS 587

Compound Conditions

How many rows: WHERE age < 31 AND rank = 4

• Assume statistical independence

• Selectivity of AND is product of selectivities

WHERE age < 31 OR rank = 4• Typical formula is s1+s2-s1*s2

04/18/23 10PSU’s CS 587

Join Size Estimation

Suppose there are ר rows in R, ש rows in S, R ⋈ S is a join on ri = sj

ri is a foreign key that references sj

How many rows are there in R ⋈ S? Most real joins are foreign key joins

04/18/23 11PSU’s CS 587

Pages, Rows in a Table?

select relpages, reltuples

from pg_class where relname = 'sailors';

04/18/23 12PSU’s CS 587

Assumptions General assumptions:

All attributes are uniformly distributed. 2 Meg buffer space, or 256 8K pages

• So can sort up to 512Mof data, 64K pages, in two passes All indexes are alternative 2, height 2. Data entries are 10% the size of data records. Each I/O takes 10ms

Reserves, Sailors, Boats: Sailors (sid: integer, sname: string, rank: integer, age: real)

• Clustered index on sid• Nonclustered index on rank

Reserves (sid: integer, bid: integer, day: dates, rname: string)

• Clustered index on (sid,bid,day)• Nonclustered index on bid

Boats (bid: integer, color:string)• Clustered index on bidBytes

/RowRows/Page

Pages

Sailors 50 80 500

Reserves

40 100 1000

Boats 16 250 20

Rank: 1 to 10

Bid: 1 to 100

04/18/23 13PSU’s CS 587


Operators and example schema Selection

One term Nonclustered index optimization Multiple terms

Projection Equijoins General Joins Set Operators Buffering

04/18/23 14PSU’s CS 587

14.1 One Table, One Term

• Predicate is of the form R.attr op value, where op is < .

• Assume 10% of rnames begin with A, B.

• Suppose there is a clustered B+ tree index and an unclustered B+ tree index, both on rname.

• What are the possibile plans (algorithms)?

• Which plan will the optimizer choose?

• We’ll compute the cost of each plan/algorithm.

SELECT *FROM Reserves RWHERE R.rname < ‘C%’

14. Op.Eval.

04/18/23 15PSU’s CS 587

What is the optimal plan? File Scan Plan Cost

I/Os Seconds

Nonclustered Index Scan Plan Cost Access first qualifying data entry Access first qualifying data item Access all qualifying data items Total Cost

Clustered Index Scan Plan Cost Access first qualifying data entry Access first qualifying data item Access all qualifying data items Access all qualifying data entries Total Cost

What is the optimal plan?

14. Op.Eval.

04/18/23 16PSU’s CS 587

Index entries

Data entries

(Index File)

Data Records

CLUSTERED

(Data file)

Data entries

Data Records

UNCLUSTERED

04/18/23 17PSU’s CS 587

Costs for a one-term selection on R, with selectivity ρ

File scan Number of pages in R = M

Nonclustered index scan Worst case is at least the number of

qualifying rows in R = (MpR)ρ.

Clustered index scan Hight of index + number of qualifying data

entries + number of qualifying data pages = 2 + .1Mρ + Mρ = 2+1.1Mρ

*The .1 comes from our assumption that data entries are 10% the length of data items, and should be adjusted accordingly.

04/18/23 18PSU’s CS 587

How do the access methods compare?

Remember Clustered Indexes are rare pR is typically large

Conclusions Clustered Index Scan is optimal if ρ< 1/1.1 If there is no clustered index, File Scan is a big win

unless ρ is very small Let’s look at ways to make a Nonclustered

Index Scan even better

File Scan M

Nonclustered Index Scan

At least pR*ρ*M

Clustered Index Scan 2+1.1*ρ*M

04/18/23 19PSU’s CS 587

Nonclustered Index Optimization

Consider this algorithm for using a nonclustered index (the “shopping list” optimization):

1. Retrieve the RIDs in all the qualifying data entries.

2+ ρ*0.1*M2. Sort those RIDs in order of page number. ???3. Fetch data records using those RIDs in sorted order. This

ensures that each data page is retrieved just once (though

the # of such pages likely to be higher than with clustering). At most M

• If ρ is small, cost is much less.

Cost? At most 2+(1+.1ρ)M + ???

14. Op.Eval.

04/18/23 20PSU’s CS 587

A Second Nonclustered Index Optimization: The ??? Term.

What if RIDs are too large to sort in memory? 1. Retrieve the RIDs in all the qualifying data entries.

2’. Make a bitmap, one bit for each page in the table.3’. For each page in a qualifying RID, turn on the corresponding bit in

the bitmap4’. Fetch each page whose bit is on.

5’. Scan every record in every fetched page to find which ones qualify.

Why is step 5’ necessary? Cost: At most 2+(1+.1ρ)M

If ρ is small, cost is much less.

14. Op.Eval.

04/18/23 21PSU’s CS 587

How do the access methods compare?

Conclusions Clustered Index Scan is optimal if ρ< 1/1.1 If there is no clustered index, File Scan is a

slight win unless ρ is very small.

File Scan M

Nonclustered Index Scan

At most 2+(1+.1ρ)M if ρ is

small it will be much less

Clustered Index Scan 2+1.1*ρ*M

04/18/23 22PSU’s CS 587

Access Paths, Matching

An access path is a way of retrieving tuples. A full or partial scan An indexed access

So an optimal plan, in this simple case (one-term WHERE clause), is an optimal access path.

A term matches an access path if that access path can be used to retrieve just the tuples satisfying the term. Example: bid>50 matches what access path?

04/18/23 23PSU’s CS 587

General Selections

First approach: 1. Choose a term (Which one?)2. Retrieve tuples using the optimal access path for that

term.3. Apply any remaining terms, in-memory

Second approach Applicable if we have 2 or more access paths matching

terms and using Alternatives (2) or (3)1. Get sets of rids of data records using each matching

index.2. Then intersect these sets of rids 3. Retrieve the records and apply any remaining terms.

Which approach is cheaper?

14. Op.Eval.

SELECT attribute list FROM relation listWHERE term1 AND ... AND termk k > 1

04/18/23 24PSU’s CS 587


Operators and example schema Selection Projection

What is the problem? Sort and hash approaches

Equijoins General Joins Set Operators Buffering

04/18/23 25PSU’s CS 587

14.3 Projection

Hard part of projection is duplicate elimination

Simple sort-based projection Assume size ratio is 0.50, duplicate ratio is 0.251. Eliminate all attributes except sid, bid.2. Sort the result by (sid,bid). 3. Scan that result, eliminating duplicates

• Cost?

Optimizations? Combine attribute elimination with sort

• Cost? Combine duplicate elimination with sort.

• Best case cost?

SELECT DISTINCT R.sid, R.bidFROM Reserves R

14. Op.Eval.

04/18/23 26PSU’s CS 587

Projection Based on Hashing Simple Hash-based projection

Eliminate unwanted attributes Hash the result into B-1 output buffers and

partitions For each partition: Read in, eliminate

duplicates in memory, write the answer Optimizations?

Combine elimination of unwanted attributes with hashing.

Eliminate duplicates as soon as possible

14. Op.Eval.

04/18/23 27PSU’s CS 587

Discussion of Projection Sort-based approach is the standard; better handling

of skew and result is sorted. If an index on the relation contains all wanted

attributes in its search key, can do index-only scan. Apply projection techniques to data entries (much smaller!)

If an ordered (i.e., tree) index contains all wanted attributes as prefix of search key, can do even better: Retrieve data entries in order (index-only scan), discard

unwanted fields, compare adjacent tuples to check for duplicates.

{Prefixes

Single index (bid, sid, day)

04/18/23 28PSU’s CS 587


Operators and example schema Selection Projection Equijoins

Nested Loop Join Index Nested Loop Join Sort-Merge Join Hash Join Hybrid Hash Join

General Joins Set Operators Buffering

04/18/23 29PSU’s CS 587

14.4 Equijoin With One Join Column

SELECT S.name, R.bidFROM Reserves R, Sailors SWHERE R.sid=S.sid AND bid = 100

Example:

Reserves Sailors

⋈sid=sid

bid=100

sname

04/18/23 30PSU’s CS 587

Nested Loops Join

R is called the outer, S the inner table.

(1) foreach tuple r in R do(2) for each tuple s in S

if ri == sj then output <r, s>

14. Op.Eval.

04/18/23 31PSU’s CS 587

The Join Algorithms Every in-memory join algorithm is in some sense

a special case of Nested Loops Simple Nested Loops: Scan outer, scan inner Index Nested Loops: Scan outer, index scan inner Sort Merge: Inputs must be sorted on joining attributes.

Scan outer in order, scan matching interval of inner. Hash Join: special case of Index Nested Loops, where

index is constructed on-the-fly and inputs are partitioned.

Implications Every join algorithm scans its outer relation. Every join algorithm costs at least M.

14. Op.Eval.

04/18/23 32PSU’s CS 587

Cost of Nested Loops Join (Scan of R)+(# rows in R)*(Scan of S) = M + (M*pR*ρ)*N

Where ρ is the selectivity of the selection on R For simplicity we are assuming that M is accessed

with a file scan. In general M could be accessed with an index scan and the first cost “M” would be replaced by the cost of the index scan.

For our example, the cost is prohibitive: 1000+(1,000)*500 = 501,000 I/Os = 5,010

seconds ~= 1.4 hours Nested loops is used only for small tables.

04/18/23 33PSU’s CS 587

Index Nested Loops Join

Cost: M + (M*pR* ρ) * (Cost of finding matching S tuples) Again, the first cost M might be replaced by an index scan cost.

Cost of finding matching S tuples= cost of probing S index + cost of finding S tuples. For each R tuple, cost of probing S index is about 1.2 for hash

index, 2-4 for B+ tree. Cost of then finding S tuples (assuming Alt. (2) or (3) for data

entries) depends on clustering.• Clustered index: 1 I/O (typical), unclustered: up to 1 I/O per

matching S tuple.

Valid only when S has an index on sj

foreach tuple r in R doforeach tuple s in S where ri == sj use index

hereadd <r, s> to result

14. Op.Eval.

04/18/23 34PSU’s CS 587

Examples of Index Nested Loops

In our example Sailors has a clustered index on sid so we can apply INL.

Cost is M + (M*pR* ρ) * 3 = 1000 + (1000*100*.01) *3 = 4000 I/Os = 40 seconds.

14. Op.Eval.

04/18/23 35PSU’s CS 587

14.4.2 Sort-Merge Join Costs are in Blue.

Assume R, S can be sorted in 2 passes, i.e. M, N < B2

No-brainer Implementation: Sort R 4M Sort S 4N Merge-join R and S M+N Total Cost 5(M+N)

Better Implementation: Sort* R into runs 2M Sort* S into runs 2N Merge runs of R and S and merge-join R and S M+N Total Cost 3(M+N)*Use snowplow/replacement sort optimization

14. Op.Eval.

04/18/23 36PSU’s CS 587

Example of Sort-Merge Join

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day rname

28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustin

14. Op.Eval.

04/18/23 37PSU’s CS 587

Sort-Merge Join Optimization

. .

.

. .

.

Runs from S

Runs from R

.

.

.

.

Fill Buffers

Merge Join Output

04/18/23 38PSU’s CS 587

Hybrid SMJ costs

If R and S are sorted, cost is M+N If R is sorted, cost is M+N +2 ρSN – 2( B - ρSN/B ) Argument similar to preceding page

If neither R nor S are sorted, cost is M+N + 2 ρRM + 2 ρSN -2( B - ρRM/B - ρSN/B )

The initial M and N costs might be replaced by index scan costs if appropriate.

⋈R S

R S

04/18/23 39PSU’s CS 587

Example: Hybrid SMJ

Assume this plan uses a clustered sid index scan of Reserves at a cost of 1102 and a file scan of Sailors at a cost of 500, B = 256.

Plan cost is then M+N +2 ρSN – 2( B - ρSN/B ) = 1102 + 500 +2*.5*500-2(256-.5*500/256) =

1590

⋈bid=100 rank>5

Reserves Sailors

04/18/23 40PSU’s CS 587

14.4.3 Hash-Join [677] Confusing because it combines two distinct

ideas:1. Partition R and S using a hash function

• You should understand how and why this works. Read the text if you don’t.

2. Join corresponding partitions using a version of INL with hash indexes built on-the-fly.• You should understand why the hash function used here

must differ from the hash function used for partitioning. The partition step could be followed by any join

method in step 2. Hashing is used because The tables are large (they needed partitioning) so NLJ

is not a contender. The partitions were just created so they have no

indexes on them, so classic INL is not possible. The partitions were just created so are not sorted, so

SMJ is not a big winner. There may be cases where a sort order is needed later, but this is not considered.

04/18/23 41PSU’s CS 587

Partitionsof R & S

Input bufferfor Si

Hash table for partitionRi (k < B-1 pages)

B main memory buffersDisk

Output buffer

Disk

Join Result

hashfnh2

h2

B main memory buffers DiskDisk

Original Relation OUTPUT

2INPUT

1

hashfunction

h B-1

Partitions

1

2

B-1

. . .

04/18/23 42PSU’s CS 587

Hash-Join Example: PartitionR

16,23,17,86,84,21,61

S

19,17,84,25, 22

Input

0 mod k

1 mod k

k Partitions of R

k Partitions of S

04/18/23 43PSU’s CS 587

Hash-Join Example: INL

Join Output

Partitions of R

Partitions of S

16,86,84

23,17,21,61

84,22

19,17,25,43

Scan partition of S

Hashed partition of R

Output

16,86,84

04/18/23 44PSU’s CS 587

Cost of Hash Join We only need one of M,N to be B2 !

Let it be M Read R, write B partitions: 2M

• Each one is, on the average, at most B pages• Some fudging going on here

Read S, write B partitions: 2N• We don’t care about the size of these partitions

For each of the B partitions of R and S: M+N• Build a hash table for the R-partition• Read each partition of R, and matching partition of S:

M+N

Total: 3(M+N) Can we do better if M << B2? Hybridize!

04/18/23 45PSU’s CS 587

Hybrid Hash Join, M 2B, Partition

Input Buffer

Partition 0 Output Buffer

Partition 1 Of R

Partition 0 of S

Partition 0 of R

S

R

R1 ⋈ S1

04/18/23 46PSU’s CS 587

Hybrid Hash Join, M 2B, INL

R0 ⋈ S0

Output Buffer

Load R Partition 0

Partition 0 of S

Scan S Partition 0

Partition 0 of R

04/18/23 47PSU’s CS 587

Hybrid Hash Join, Partition

Input Buffer

Partitions 0..k-1 Output Buffers

Partition k Of R

Partitions 0..k-1 of S

Partitions 0..k-1 of R

S

R

Rk ⋈ Sk

…

…

…

04/18/23 48PSU’s CS 587

Hybrid Hash Join, INL, Partition i

Ri ⋈ Si

Output Buffer

Load R Partition i

Partition i of S

Scan S Partition 0

Partition i of R

04/18/23 49PSU’s CS 587

Fudging in our treatment of Hash-Join

If we build an in-memory hash table to speed up the matching of tuples, a little more memory is needed.

If the hash function does not partition uniformly, one or more R partitions may not fit in memory. In this case we can apply the hash-join

technique recursively to do the join of this R-partition with its corresponding S-partition.

14. Op.Eval.

04/18/23 50PSU’s CS 587

Sort-Merge vs. Hash Join

When can we achieve the low cost of 3(M+N)? Hash: Smaller of M, N B2

Sort-merge: Both M, N B2

So Hash Join is preferable if table sizes differ greatly.

But Sort-Merge Join is less sensitive to data skew. The result of Sort Merge Join is ordered.

04/18/23 51PSU’s CS 587

Hash Join comments

Can view first phase of Hash Join of R, S as divide and conquer: partitioning the join into subjoins that can be done with one input smaller than memory.

What to do if we don’t know the size of R? Assume the case M B, then split off more and more of the hash table onto disk as it grows. Used in Postgres.

04/18/23 52PSU’s CS 587

Equijoin Cost Summary

Nested Loops best for small tables Others have high overhead costs

Nested Loops M + MN

Index Nested Loops M + M pR(1.2-4)(1+)

Sort-Merge, Hash 3(M+N)

04/18/23 53PSU’s CS 587

14.4.4 General Join Conditions

Equalities over several attributes (e.g., R.sid=S.sid AND R.rname=S.sname): For Index NL, build index on <sid, sname> (if

S is inner); or use existing indexes on sid or sname.

For Sort-Merge and Hash Join, sort/partition on combination of the two join columns.

Not much different than one-equality equijoins.

14. Op.Eval.

04/18/23 54PSU’s CS 587

Inequality Conditions

Typical example: R.rname < S.sname Relatively rare. Does the example make

sense? For Index NL, need (clustered!) B+ tree

index to get any efficiency.• Range probes on inner; # matches likely to be

much higher than for equality joins. Hash Join, Sort Merge Join not applicable. With no index, Block NL is the best join

method.

04/18/23 55PSU’s CS 587

14.5 Set Operations INTERSECTION:

INTERSECTION is a special case of join.• How is it a special case?

So all the usual join algorithms apply CROSS PRODUCT

Also a special case of join• How?

What join algorithm is best to compute CROSS PRODUCT?

• Will Sort-Merge or Hash help?

14. Op.Eval.

04/18/23 56PSU’s CS 587

UNION and EXCEPT They are similar; we’ll do UNION. The hard part is removing duplicates. Sorting based approach to union:

Sort both relations (on a key). Merge sorted relations, discarding duplicates. Alternative: Merge runs from Pass 0 for both

relations. Hash based approach to union:

Partition R and S using hash function h. For each S-partition, build in-memory hash table,

scan corresponding R-partition and add tuples to table while discarding duplicates.

04/18/23 57PSU’s CS 587

14.6 Aggregates w/o grouping Example

SELECT AVE(S.age) FROM SAILORS S In general, requires scanning the relation.

What is the cost for this query? Given index whose search key includes all

attributes in the SELECT or WHERE clauses, can do index-only scan. If there is an index on age, what is the cost of

this query?

14. Op.Eval.

04/18/23 58PSU’s CS 587

Aggregates with grouping Example

SELECT S.rating, AVE(S.age) FROM SAILORS S GROUP BY S.rating

What is an algorithm based on sorting? What is the cost of this query, assuming 2-pass sort? Optimizations? Cost?

What is an algorithm based on hashing? How much memory is needed? Cost?

Conclusion: Hash is best If there is an appropriate index, use index-only alg.

14. Op.Eval.

04/18/23 59PSU’s CS 587

14.7 Impact of Buffering If several operations are executing

concurrently, estimating the number of available buffer pages is guesswork.

Repeated access patterns interact with buffer replacement policy. e.g., Inner relation is scanned repeatedly in

Simple Nested Loop Join. With enough buffer pages to hold inner, replacement policy does not matter. Otherwise, MRU is best, LRU is worst (sequential flooding).

What replacement policy is best for • Block Nested Loops?• Index Nested Loops? • Sort-Merge Join sort phase?• Sort-Merge Join merge phase?

14. Op.Eval.