SQL on Fire! Part 2 Tips and Tricks around SQL. Agenda Part I SQL vs. SQL PL example Error...

Preview:

Citation preview

SQL on Fire! Part 2Tips and Tricks around SQL

Agenda Part I

SQL vs. SQL PL example Error handling Tables out of nowhere Pivoting Aggregation Deleting duplicate rows Alternatives to Min and Max

Part II Mass deletes Order processing Moving rows between tables Recursion Merge Calculating nesting levels

Easy

Difficult

Motivation – The OLTP Mantra

•Reduce Codepath

•Reduce Logical and Physical I/O

•Reduce Network Traffic

•Minimize Lock Contention

•Avoid Deadlocks

High performance starts with the application.

Mass deleting of rows

Problem:

Delete a large number of rows without excessive usage of log space.

Classic:

– Delete using cursorVery slow

– EXPORT remaining data, LOAD replaceOutage where table is unavailable.

Can do better!

Mass deleting of rows

CREATE PROCEDURE purgeInventory(IN dt DATE)BEGIN DECLARE SQLCODE INTEGER;loop: LOOP DELETE FROM (SELECT 1 FROM Inventory WHERE InvDate <= dt FETCH FIRST 1000 ROWS ONLY) AS D; IF SQLCODE = 100 THEN LEAVE loop; END IF; COMMIT; END LOOP loop;END

CALL purgeInventory(‘2003-10-01’)

New order processing - classic• "Submit order, provide reference # back"

•Retrieve next order #SELECT nextnum FROM ordermeta; 1 I/O, S

•Increment next order #UPDATE ordermeta 1 I/O SET nextnum = nextnum + 1; S->X

•Insert new orderINSERT INTO orders VALUES(nextnum, ...)

1 I/O•Return nextnum to user

• Deadlock, 3 SQL Stmts, 3 I/O, single row

New order processing - improved• Use SEQUENCE/IDENTITYINSERT INTO order VALUES(NEXT VALUE FOR orderseq, ...);SET ordernum = PREVIOUS VALUE FOR orderseq;

• No Deadlock, 2 SQL statements, 1 I/O, single row only

• Use SELECT FROM INSERT SELECT ordernum INTO :ordernumFROM NEW TABLE(INSERT INTO orders VALUES(NEXT VALUE FOR orderseq, ...));

• No Deadlock, 1 I/O,1 SQL Stmt, set oriented

New order processing – Optimal planAccess Plan:----------- RETURN ( 1) | 1 INSERT ( 4) 1 /---+---\ 1 180 TBSCAN TABLE: ORDERS ( 5) | 1 TABFNC: GENROW

Queue processing – Destructive read

• "retrieve and delete next in line“

•Retrieve "oldest" row in queueSELECT ordernum, ... INTO :ordernum 1I/O, S FROM orders ORDER BY ordernum FETCH FIRST ROW ONLY;

•Delete the rowDELETE FROM order 1 I/O, S->XWHERE :ordernum = ordernum;

• Deadlock, 2 SQL Stmts, single row (or IN list)

Destructive read - Improved

• Delete through ORDER BY

SELECT ordernum, ... INTO :ordernum, ... FROM OLD TABLE(DELETE FROM (SELECT * FROM orders ORDER BY ordernum

FETCH FIRST ROW ONLY));

• no Deadlock, 1 I/O, set oriented

Destructive Read – Optimal plan

Access Plan:----------- RETURN ( 1) | DELETE ( 4) /----+---\ IXSCAN TABLE: ORDERS ( 5) | INDEX: i1

Queue processing – 2-phase

CREATE TABLE orders(ordernum INTEGER NOT NULL, agentid INTEGER);

CREATE UNIQUE INDEX orderind ON orders(ordernum ASC) INCLUDE (agentid ASC);

ALTER TABLE orders ADD CONSTRAINT PK PRIMARY KEY (ordernum);

Tip 1: Combine unique and non unique index using INCLUDE.Tip 2: Add primary key after creating index to control which index is used (index name and include columns).

Queue processing – 2 Phase

Ordernum AgentID

1031 15

1032 7

1033 20

1034 NULL

1035 NULL

1036 NULL

1037 NULL

Queue processing – claim order

SET vthisorder = (SELECT ordernum FROM OLD TABLE(UPDATE (SELECT ordernum, status FROM orders WHERE agentid IS NULL ORDER BY ordernum FETCH FIRST ROW ONLY) AS U SET agentid = vagentid));COMMIT;…; -- Long processingDELETE FROM orders WHERE ordernum = vthisorder;COMMIT;

Queue processing – claim order

Access Plan:------------ RETURN ( 1) | TBSCAN ( 2) | SORT ( 3) | UPDATE ( 4) /---+---\ IXSCAN TABLE: ORDERS ( 5) | INDEX: ORDERIND

Moving duplicate rows Task

Delete rows from one table and insert into another

CREATE TABLE Archive LIKE Inventory;

WITH del(Item, Quantity, InvDate) AS (SELECT Item, Quantity, InvDate

FROM OLD TABLE (DELETE FROM (SELECT Item, Quantity, InvDate, row_number() OVER(PARTITION BY Item ORDER BY InvDate DESC) AS rn FROM Inventory) WHERE rn > 1)),ins(x) AS (SELECT 1 FROM NEW TABLE(INSERT INTO Archive SELECT * FROM del))SELECT COUNT(1) FROM ins;

Move duplicate rows

Do-At-Open Dam(1-row)

RETURN ( 1) | TBSCAN ( 2) | SORT ( 3) | GRPBY ( 4) | INSERT ( 5) /---+---\ DELETE TABLE: ARCHIVE ( 6) /---+---\ FETCH TABLE: INVENTORY ( 7) /---+---\ FILTER TABLE: INVENTORY ( 8) | FETCH ( 9) /----+---\ IXSCAN TABLE: INVENTORY ( 10) | INDEX: SRIELAU

Recursion

Problem

Have table of sales per working day.Need table of sales per calendar day.

DDL

CREATE TABLE Sales(day VARCHAR(10), date DATE, amount INTEGER)

Recursion

Day Date Amount

Friday 2006-May-12 30

Weekend

Monday 2006-May-15 20

Tuesday 2006-May-16 15

Wednesday 2006-May-17 25

Thursday 2006-May-18 31

Friday 2006-May-19 32

Weekend

Monday 2006-May-22 11

Tuesday 2006-May-23 18

RecursionProduce a date range

CREATE FUNCTION dates(start DATE, end DATE)RETURNS TABLE(dt DATE)RETURN WITH rec(dt) AS (VALUES (start) UNION ALL SELECT dt + 1 DAY FROM rec WHERE dt < end) SELECT dt FROM rec;

SELECT DAYNAME(date) AS day, date, COALESCE(sales, 0) AS salesFROM TABLE(dates(DATE('2006-05-12'), DATE('2006-05-23'))) AS datesLEFT OUTER JOIN sales ON dates.dt = sales.date;

RecursionDay Date Amount

Friday 2006-May-12 30

Saturday 2006-May-13 0

Sunday 2006-May-14 0

Monday 2006-May-15 20

Tuesday 2006-May-16 15

Wednesday 2006-May-17 25

Thursday 2006-May-18 31

Friday 2006-May-19 32

Saturday 2006-May-20 0

Sunday 2005-May-21 0

Monday 2006-May-22 11

Tuesday 2006-May-23 18

Recursion inside out

(a, b, c)(d, e, f)(g, h, i)(j, k, l)

Seed

(a, b, c)(d, e, f)(g, h, i)(j, k, l)

Read Cursor

(h, i, j)(k, m, n)(k, l, m)

Insert

1 Seed -> rec-temp2 For each row in temp execute recursion

• Append to temp

3 Finish when 2. catches up with appends

(a, b, c)(d, e, f)(g, h, i)(j, k, l)

Read Cursor

(h, i, j)(k, m, n)(k, l, m)

(z, z, z)(z, z, z)

RETURN ( 1) | TBSCAN ( 2) | TEMP ( 3) | UNION ( 4) /----+---\ TBSCAN TBSCAN ( 5) ( 6) | | TEMP TABFNC: GENROW ( 3)

Rec-Plan

Merge

•Unifies Update, Delete, Insert

•Procedural statement

• Set oriented processing per branch

• Consistency points at each branch

•SQL Standard

Merge Make UpMERGE INTO <target> USING <source> ON <match-condition>{WHEN [NOT] MATCHED [AND <predicate>] THEN [UPDATE SET ...|DELETE|INSERT VALUES ....|SIGNAL ...]}[ELSE IGNORE]

•<target> is any updatable query•<source> is whatever query you please•<match-condition> partitions <source> into MATCHED and NOT MATCHEDEach target-row must only be matched once!

•WHEN .. [<predicate>] executes THEN for subset of [not] matched rows.

•each row in <source> is processed once in first qualifying WHEN only

Update From

CREATE TABLE T(pk INT NOT NULL PRIMARY KEY, c1 INT);CREATE TABLE S(pk INT NOT NULL PRIMARY KEY, c1 INT);

Standard pre SQL4UPDATE T SET c1 = (SELECT c1 FROM S WHERE S.pk = T.pk) WHERE pk IN (SELECT pk FROM S);

IBM Informix/MS SQL ServerUPDATE T SET c1 = S.c1 FROM T, S WHERE T.pk = S.pk;

Merge: Update From

RETURN | UPDATE /---+---\ FILTER TABLE: T | NLJOIN /-------+-------\ TBSCAN FETCH | /----+---\ TABLE: S IXSCAN TABLE: T | INDEX: T_PK

MERGE INTO T USING S ON T.pk = S.pk WHEN MATCHED THEN UPDATE SET c1 = S.c1;

Upsert

Standard pre SQL4 (one way of many)

FOR m AS SELECT T.pk tpk, S.pk spk, S.c1 FROM T RIGHT JOIN S ON T.pk = S.pk DO IF m.tpk IS NULL THEN INSERT INTO T VALUES(m.spk, m.c1); ELSE UPDATE T SET c1 = m.c1 WHERE pk = tpk; END IF;END FOR;

RETURN | INSERT /---+---\ TBSCAN TABLE: T | TEMP | UPDATE /---+---\ NLJOIN TABLE: T /------------------+-----------------\ NLJOIN UNION 25.7489 0.00166765 /-------+-------\ /------+-----\ TBSCAN FETCH FILTER FILTER | /----+---\ | | TABLE: S IXSCAN TABLE: T TBSCAN TBSCAN | | | INDEX: T_PK TABFNC: GENROW TABFNC: GENROW

Merge UpsertMERGE INTO T USING S ON T.pk = S.pk

WHEN MATCHED THEN UPDATE SET c1 = S.c1WHEN NOT MATCHED THEN INSERT VALUES(pk, c1);

Merge single row Upsert

MERGE INTO T USING (VALUES(1, 2)) AS S(pk, c1)ON S.pk = T.pkWHEN MATCHED THEN UPDATE SET c1 = S.c1WHEN NOT MATCHED THEN INSERT VALUES (pk, c1)

RETURN | INSERT /------+------\ UPDATE TABLE: T /------+-------\ NLJOIN TABLE: T /-------------+-------------\ NLJOIN UNION /-----+-----\ /------+-----\ TBSCAN TBSCAN FILTER FILTER | | | | TABFNC: GENROW TABLE: T TBSCAN TBSCAN | | TABFNC: GENROW TABFNC: GENCROW

Merge rules of engagement

Place INSERT branch at the end.

Otherwise second temp between INSERT and UPDATE

Aim for NLJOIN over MERGE target in OLTP.

Other joins result in exclusive lock on target table.E.g. drop optimization level to 3

Calculating nesting levels

ProblemGiven a log table with nested events.How deep is nesting at any given time?

DDLCREATE TABLE log(timestamp TIMESTAMP, event CHAR(3), data VARCHAR(10));

Calculating nesting levelsTimestamp Event

2006-03-02 04:12:16.124565 In

2006-03-02 04:12:19.134514 In

2006-03-02 04:13:42.424085 Out

2006-03-02 04:15:31.872452 In

2006-03-02 04:16:42.004545 In

2006-03-02 04:17:01.994432 In

2006-03-02 04:17:23.569474 Out

2006-03-02 04:22:25.946465 Out

2006-03-02 04:32:19.543438 In

2006-03-02 04:33:58.172535 Out

2006-03-02 04:42:00.836468 Out

2006-03-02 04:46:51.643544 Out

Calculating nesting levelsNesting Timestamp Event

> 2006-03-02 04:12:16.124565 In

|> 2006-03-02 04:12:19.134514 In

|< 2006-03-02 04:13:42.424085 Out

|> 2006-03-02 04:15:31.872452 In

||> 2006-03-02 04:16:42.004545 In

|||> 2006-03-02 04:17:01.994432 In

|||< 2006-03-02 04:17:23.569474 Out

||< 2006-03-02 04:22:25.946465 Out

||> 2006-03-02 04:32:19.543438 In

||< 2006-03-02 04:33:58.172535 Out

|< 2006-03-02 04:42:00.836468 Out

< 2006-03-02 04:46:51.643544 Out

Calulating nesting levelsSELECT REPEAT('|',SUM(CASE event

WHEN 'In' THEN 1 WHEN 'Out' THEN -1 END) OVER (ORDER BY timestamp) + CASE event WHEN 'In' THEN -1 ELSE 0 END)

|| CASE event WHEN 'In' THEN '>'

WHEN 'Out' THEN '<' END AS nesting,

timestamp,

event

FROM Log

ORDER BY timestamp;

Appendix – V8 feature roadmap

FP2– MERGE (prefer FP9 for performance optimizations)– ORDER BY and FETCH FIRST in subquery

FP4 (TPC-C)– SELECT FROM UPDATE/DELETE/INSERT– UPDATE/DELETE/INSERT of subquery and order by.

FP7– CALL in “inline” SQL PL,– “native” SQL Procedures (prefer FP9 for stability) – DROP/SET DEFAULT and identity columns

FP9– Automatic storage

Conclusion

•Exploit SQL to:

•Increase concurrency•Reduce I/O•Reduce code-path•Make the application more readable

•SQL provides powerful support

Serge RielauIBM

srielau@ca.ibm.com

SQL on Fire! Part 2