ICS 421 Spring 2010 Query Evaluation ( i )

ICS 421 Spring 2010

Query Evaluation (i)

Asst. Prof. Lipyeow LimInformation & Computer Science Department

University of Hawaii at Manoa

2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

QuerySELECT * FROM Reserves WHERE sid=101

Sid=101

Reserves

SCAN (sid=101)

ReservesIDXSCAN (sid=101)

Reserves

Index(sid)

32.0 25.0

Pick B

Evaluate Plan A

Optimizer

Parse Query• Input : SQL

– Eg. SELECT-FROM-WHERE, CREATE TABLE, DROP TABLE statements

• Output: Some data structure to represent the “query”– Relational algebra ?

• Also checks syntax, resolves aliases, binds names in SQL to objects in the catalog

• How ?2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

Enumerate Plans• Input : a data structure representing the

“query”• Output: a collection of equivalent query

evaluation plans• Query Execution Plan (QEP): tree of

database operators.– high-level: RA operators are used– low-level: RA operators with particular

implementation algorithm.• Plan enumeration: find equivalent plans

– Different QEPs that return the same results– Query rewriting : transformation of one

QEP to another equivalent QEP.2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

Estimate Cost• Input : a collection of equivalent

query evaluation plans• Output: a cost estimate for each

QEP in the collection• Cost estimation: a mapping of a

QEP to a cost– Cost Model: a model of what counts

in the cost estimate. Eg. Disk accesses, CPU cost …

• Statistics about the data and the hardware are used.

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

Choose Best Plan• Input : a collection of equivalent

query evaluation plans and their cost estimate

• Output: best QEP in the collection• The steps: enumerate plans, estimate

cost, choose best plan collectively called the:

• Query Optimizer: – Explores the space of equivalent plan

for a query– Chooses the best plan according to a

cost model2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

Evaluate Query Plan• Input : a QEP (hopefully the best)• Output: Query results• Often includes a “code

generation” step to generate a lower level QEP in executable “code”.

• Query evaluation engine is a “virtual machine” that executes some code representing low level QEP.

Parse Query

Enumerate Plans

Estimate Cost

Choose Best Plan

Evaluate Query Plan

Result

Query Execution Plans (QEPs)• A tree of database operators: each operator is a RA

operator with specific implementation• Selection : Index Scan or Table Scan• Projection π:

– Without DISTINCT : Table Scan– With DISTINCT : requires sorting or index scan

• Join : – Nested loop joins (naïve)– Index nested loop joins– Sort merge joins

• Sort :– In-memory sort– External sort

QEP Examples

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5

S.rating>5 AND R.bid=100

Reserves Sailors

R.sid=S.sid

πS.sname

Nested Loop Join

On the fly

(SCAN) (SCAN)

S.rating>5 AND R.bid=100

Reserves Sailors

R.sid=S.sid

πS.sname

S.rating>5

Reserves Sailors

R.sid=S.sid

πS.sname

R.bid=100

S.rating>5

Reserves Sailors

R.sid=S.sid

πS.sname

Nested Loop Join

On the fly

R.bid=100

(SCAN) (SCAN)

Temp T1

Access Paths• An access path is a method of retrieving

tuples. Eg. Given a query with a selection condition:– File or table scan– Index scan

• Index matching problem: given a selection condition, which indexes can be used for the selection, i.e., matches the selection ?– Selection condition normalized to conjunctive

normal form (CNF), where each term is a conjunct

– Eg. (day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3

– CNF: (day<8/9/94 OR bid=5 OR sid=3 ) AND (rname=‘Paul’ OR bid=5 OR sid=3)

S.rating>5

Reserves Sailors

R.sid=S.sid

πS.sname

Nested Loop Join

On the fly

R.bid=100

(SCAN) (SCAN)

Temp T1

Index(R.bid)

R.bid=100

(IDXSCAN)

Reserves

Index Matching

• A tree index matches a selection condition if the selection condition is a prefix of the index search key.

• A hash index matches a selection condition if the selection condition has a term attribute=value for every attribute in the index search key

I1: Tree Index (a,b,c)

I2: Tree Index (b,c,d)

I3: Hash Index (a,b,c)

Q1: a=5 AND b=3

Q2: a=5 AND b>6

Q3: b=3

Q4: a=5 AND b=3 AND c=5

Q5: a>5 AND b=3 AND c=5

One Approach to Selections

• The selectivity of an access path is the size of the result set (in terms of tuples or pages).– Sometimes selectivity is also used to mean reduction factor:

fraction of tuples in a table retrieved by the access path or selection condition.

• Eg. Consider the selection: day<8/9/94 AND bid=5 AND sid=3

– Tree Index(day) – Hash index (bid,sid)

1. Find the most selective access path, retrieve tuples using it2. Apply remaining terms in selection not matched by the

chosen access path

Join Algorithms• Cost model

– Single DBMS server: I/Os in number of pages– Distributed DBMS: network I/Os + local disk I/Os– td : time to read/write one page to local disk

– ts: time to ship one page over the network to another node

• Single server:– Nested Loop Join– Index Nested Loop Join– Sort Merge Join– Hash Join

• Distributed:– Semi-Join– Bloom Join

Nested Loop Join

For each data page PS1 of S1

For each tuple s in PS1

For each data page PR1 of R1

For each tuple r in PR1

if (s.sid==r.sid) then output s,r

• Worst case number of local disk reads = Npages(S1) + |S1|*Npages(R1)

sid bid day22 101 10/10/9658 103 11/12/96

sid sname rating age

22 Dustin 7 45.0

31 Lubber 8 55.5

58 Rusty 10 35.0

Index Nested Loop Join

For each data page PS1 of S1

For each tuple s in PS1

if (s.sid Index(R1.sid)) then fetch r & output <s,r>

• Worst case number of local disk reads with tree index= Npages(S1) + |S1|*( 1 + logF Npages(R1))

• Worst case number of local disk reads with hash index= Npages(S1) + |S1|* 2

sid bid day22 101 10/10/9658 103 11/12/96

22 Dustin 7 45.0

31 Lubber 8 55.5

58 Rusty 10 35.0

Index(R1.sid)

Sort Merge Join

1. Sort S1 on SID2. Sort R1 on SID3. Compute join on SID using Merging algorithm

• If join attributes are relatively unique, the number of disk pages = Npages(S1) log Npages(S1) + Npages(R1) log Npages(R1) + Npages(S1) + Npages(R1)

• If the number of duplicates in the join attributes is large, the number of disk pages approaches that of nested loop join.

sid bid day19 100 8/8/9922 101 10/10/9622 99 10/12/9558 103 11/12/96

22 Dustin 7 45.0

31 Lubber 8 55.5

58 Rusty 10 35.0

Distributed Joins

• Consider:– Reserves join Sailors

• Depends on:– Which node get the query– Whether tables are

fragmented/partitioned or not

• Node 1 gets query– Perform join at Node 3 (or 4)

ship results to Node 1 ?– Ship tables to Node 1 ?

• Node 3 gets query– Fetch sailors in loop ?– Cache sailors locally ?

Network

Boats1

Node 1

Boats2

Node 2

Reserves

Node 3

Sailors

Node 4

Distributed Joins over Fragments

R join S = R.sid=S.sid (R S)

= R.sid=S.sid ((R1R2) (S1 S2))

= R.sid=S.sid ((R1 S1) (R1 S2) (R2 S1) (R2 S2))

= R.sid=S.sid (R1 S1) R.sid=S.sid (R1 S2) R.sid=S.sid (R2 S1) R.sid=S.sid (R2 S2)

= (R1 join S1) (R1 join S2) (R2 join S1) (R2 join S2)

Network

Reserves1

Node 1

Reserves2

Node 2

Sailors1

Node 3

Sailors2

Node 4

Equivalent to a union of joins over each pair of fragments

This equivalence applies to splitting a relation into pages in a single server DBMS system too!

Distributed Nested Loop• Consider performing R1 join S2 on

Node 1• Page-oriented nested loop join:

For each page r of R1Fetch r from local diskFor each page s of S2

Fetch s if scacheOutput r join s

• Cost = Npages(R1)* td + Npages(R1)*Npages(S2)*(td + ts)

• If cache can hold entire S2, cost is Npages(R1)* td + Npages(S2)* (td + ts)

Network

Node 1

Node 2

foreachR1 page r Fetch

S2 page s

r join s

Semijoins• Consider performing R1 join S2 on

Node 1• S2 needs to be shipped to R1• Does every tuple in S2 join with R1 ?• Semijoin:

– Don’t ship all of S2– Ship only those S2 rows that will join with

R1– Assumes that the join causes a reduction

in S2!

• Cost = Npages(R1)*td + Npages(πsidR1)*ts + Cost() + Npages(sidjsidS2)*ts + Cost(R1 join sidjsidS2)

Network

Node 1

Node 2

πsidR1 (jsid,πsidR1

πsidS2)

sidjsidS2

R1 joinsidjsidS2

Bloomjoins• Consider performing R1 join S2 on

Node 1• Can we do better than semijoin ?• Bloomjoin:

– Don’t ship all of (πsidR1)– Node 1: Ship a “bloom filter” (like a

signature) of (πsidR1)• Hash each sid• Set the bit for hash value in a bit vector• Send the bit vector v1

– Node 2: • Hash each (πsidS2) to bit vector v2• Computer (v1 v2) • Send rows of S2 in the intersection

• False positives2/9/2010 Lipyeow Lim -- University of Hawaii at Manoa

Network

Node 1

Node 2

v1=Bloom(πsidR1)

v2=Bloom(πsidS2)

sidjsidS2R1 joinsidjsidS2

jsid=v1v2

ICS 421 Spring 2010 Query Evaluation ( i )

Documents

CSI 421: Computer Graphics CSI 421: Computer Graphics | saleh.sabbir.aiub@gmail.com

Fall 2005 ICS184/EECS116 – Notes 08 1 ICS 184/EECS116: Introduction to Data Management Lecture Note 8 SQL: Structured Query Language -- DDL

Thermo Scientific Dionex ICS-1100, ICS-1600, and ICS-2100 Systems

Visual 2.1 ICS Overview Unit 2: ICS Overview. Visual 2.2 ICS Overview Unit Objectives Identify: Three purposes of ICS. Requirements to use ICS

Using sub query. Sub query A query inside another query

1 Relational Query Languages Naveen Ashish Calit2 &ICS UC-Irvine

421 421 421 - fmcet.in

35026f1-page5 - Pontosmodel.compontosmodel.com/manual/35026f1-page5.pdf · b23 d26 d32 d32 d26 d27+d28 428 428 411 419 419 424 411 411 414 414 421 421 421 421 421 425 425 422 422

MKT/421 2015 TUTORIALS MKT421 MKT 421 MKT/421 2015

ICS 421 Spring 2010 Transactions & Recovery€¦ · •Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash

Hardware Specifications Back: for UPS device Temperature 0 ...ics 1001/ ics 1011/ ics 1003/ ics 1013/ ics 1300/ ics 1310/ ics 130a/ ics 131a ic-1510/ ic-1510wg/ ic-3010/ ic-3010wg

ICS-62 & ICS-102 Manual

G0402: ICS-402: ICS Overview for Executives/Senior Officials ... - ics-402...Augosto de 2020 G0402: ICS -402: ICS Overview for Executives/Senior Officials Parte 1: ICS-402: Descripción

421 - PIPE CULVERTS - OPSS 421

421 - PIPE CULVERTS - OPSS 421 - Ontario · 421 - PIPE CULVERTS - OPSS 421 ... 421.2.3 Components of Pipe Culvert Calculation ... 421.1.11 Culvert Grade and Elevations The design

Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280

ics12 09 Database Systems - National Chiao Tung …people.cs.nctu.edu.tw/.../ics/classnotes/ics12_09_Database_Systems.pdfSchema A description of the ... Structured Query Language (SQL)

ICS-10x User's Guide€¦ · User’s Manual . ICS-100 / ICS-101 / ICS-102 . ICS-102S15 / ICS-105A . RS-232/RS-422/RS-485 . over . 100Base-FX / 10/100Base-TX . Media Converter

Sin título-1 · WD 421 Front panel WD 421 Rear panel WD 421 Features WD 421 WD 421. WD 421 WD 421 User Manual /Manual de Usuario Pag 1 EN Four bands LED SIG (-30dB), 0dB, 10 dB &

ANNUAL REPORT - Indiana · 2020. 8. 11. · 35 20 35 35 35 31 35 421 35 35 35 421 421 31 421 24 421 20 421 6 41 41 41 231 30 231 421 30 30 31 33 33 ... e ein Bronston ittle or ivonia