View
126
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Many questions on database newsgroups and forums can be answered with uses of outer joins. Outer joins are part of the standard SQL language and supported by all RDBMS brands. Many programmers are expected to use SQL in their work, but few know how to use outer joins effectively. Learn to use this powerful feature of SQL, increase your employability, and amaze your friends! Karwin will explain outer joins, show examples, and demonstrate a Sudoku puzzle solver implemented in a single SQL query.
Citation preview
SQL Outer Joins for Fun and Profit
Bill Karwin Proprietor/Chief Architect
[email protected] www.karwin.com
2006-07-27 OSCON 2006 2
Introduction
n Overview of SQL joins: inner and outer n Applications of outer joins n Solving Sudoku puzzles with outer joins
2006-07-27 OSCON 2006 3
Joins in SQL
n Joins: n The SQL way to express relations between data
in tables n Form a new row in the result set, from matching
rows in each joined table n As fundamental to using a relational database as
a loop is in other programming languages
2006-07-27 OSCON 2006 4
Inner joins refresher
n ANSI SQL-89 syntax: SELECT ... FROM products p, orders o WHERE p.product_id = o.product_id;
n ANSI SQL-92 syntax: SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id;
2006-07-27 OSCON 2006 5
Inner join example
Products product_id
Abc
Def
Efg
Orders product_id order_id
Abc 10
Abc 11
Def 9
2006-07-27 OSCON 2006 6
Inner join example
Query result set product_id Product
attributes order_id Order
attributes
Abc $10.00 10 2006/2/1
Abc $10.00 11 2006/3/10
Def $5.00 9 2005/5/2
SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id;
2006-07-27 OSCON 2006 7
Outer joins
n Returns all rows in one table, but only matching rows in joined table. Returns NULL where no row matches.
n Not supported in SQL-89
n SQL-92 syntax: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id;
2006-07-27 OSCON 2006 8
Types of outer joins n LEFT OUTER JOIN
Returns all rows from table on left. Returns NULLs in columns of right table where no row matches
n RIGHT OUTER JOIN Returns all rows from table on right. Returns NULLs in columns of left table where no row matches.
n FULL OUTER JOIN Returns all rows from both tables. Returns NULLs in columns of each, where no row matches.
2006-07-27 OSCON 2006 9
Support for OUTER JOIN
Open-source RDBMS products:
MySQL PostgreSQL Firebird SQLite Hypersonic HSQLDB
Apache Derby
Ingres R3
LEFT OUTER JOIN
ü ü ü ü ü ü ü RIGHT OUTER JOIN
ü ü ü ü ü ü ü FULL
OUTER JOIN
ü ü ü ü
2006-07-27 OSCON 2006 10
Outer join example
Products product_id
Abc
Def
Efg
Orders product_id order_id
Abc 10
Abc 11
Def 9
NULL NULL
2006-07-27 OSCON 2006 11
Outer join example Query result set
product_id Product attributes
order_id Order attributes
Abc $10.00 10 2006/2/1
Abc $10.00 11 2006/3/10
Def $5.00 9 2005/5/2
Efg $17.00 NULL NULL
SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id;
2006-07-27 OSCON 2006 12
So what?
n Difference seems trivial and uninteresting n SQL works with sets and relations n Operations on sets combine in powerful
ways (just like operations on numbers, strings, or booleans)
INNER JOIN LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
2006-07-27 OSCON 2006 13
Solutions using outer joins
n Extra join conditions
n Subtotals per day n Localization n Mimic
NOT IN (subquery)
n Greatest row per group
n Top three per group n Finding attributes in
EAV tables (entity-attribute-value)
n Sudoku puzzle solver
2006-07-27 OSCON 2006 14
Extra join conditions
n Problem: match only with orders created this year.
n Put extra conditions on the outer table into the ON clause. This applies the conditions before the join:
SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01';
2006-07-27 OSCON 2006 15
Extra join conditions
Products product_id
Abc
Def
Efg
Orders product_id order_id date
Abc 10 2006/2/1
Abc 11 2006/3/10
Def 9 2005/5/2
NULL NULL NULL
2006-07-27 OSCON 2006 16
Extra join conditions
Query result set product_id Product
attributes order_id Order
attributes
Abc $10.00 10 2006/2/1
Abc $10.00 11 2006/3/10
Def $5.00 NULL NULL
Efg $17.00 NULL NULL
SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01';
2006-07-27 OSCON 2006 17
Subtotals per day
n Problem: show all days, and the subtotal of orders per day even when there are zero.
n Requires an additional table containing all dates in the desired range.
SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date;
2006-07-27 OSCON 2006 18
Subtotals per day
Days date
2005/5/2
. . .
. . .
. . .
. . .
2006/2/1
. . .
. . .
. . .
. . .
2006/3/10
. . .
Orders date order_id
2005/5/2 9
2006/2/1 10
2006/3/10 11
NULL NULL
2006-07-27 OSCON 2006 19
Subtotals per day Query result set
date COUNT()
2005/5/2 1
. . . 0
. . . 0
. . . 0
. . . 0
2006/2/1 1
. . . 0
. . . 0
. . . 0
. . . 0
2006/3/10 1
. . . 0
SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date;
2006-07-27 OSCON 2006 20
Localization
n Problem: show translated messages, or in default language if translation is not available.
SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en';
n COALESCE() returns its first non-null argument.
2006-07-27 OSCON 2006 21
Localization
messages message_id language message
123 en Thank you
123 sp Gracias
456 en Hello
NULL
2006-07-27 OSCON 2006 22
Localization
Query result set message_id message
123 Gracias
456 Hello
SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en
ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en';
2006-07-27 OSCON 2006 23
Mimic NOT IN subquery
n Problem: find rows for which there is no match.
n Often implemented using NOT IN (subquery): SELECT ... FROM products p WHERE p.product_id NOT IN (SELECT o.product_id FROM orders o)
2006-07-27 OSCON 2006 24
Mimic NOT IN subquery
n Can also be implemented using an outer join:
SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL;
n Useful when subqueries are not supported (e.g. MySQL 4.0)
2006-07-27 OSCON 2006 25
Mimic NOT IN subquery
Products product_id
Abc
Def
Efg
Orders product_id order_id
Abc 10
Abc 11
Def 9
NULL NULL
2006-07-27 OSCON 2006 26
Mimic NOT IN subquery
Query result set
product_id Product attributes
order_id Order attributes
Efg $17.00 NULL NULL
SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL;
2006-07-27 OSCON 2006 27
Greatest row per group
n Problem: find the row in each group with the greatest value in one column
SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL;
n I.e., show the rows for which no other row exists with a greater date and the same product_id.
2006-07-27 OSCON 2006 28
Greatest row per group
Orders o2 product_id order_id date
Abc 10 2006/2/1
Abc 11 2006/3/10
Def 9 2005/5/2
Orders o1 product_id order_id date
Abc 10 2006/2/1
Abc 11 2006/3/10
Def 9 2005/5/2
Products product_id
Abc
Def
Efg NULL
2006-07-27 OSCON 2006 29
Greatest row per group
Query result set product_id Product
attributes order_id Order
attributes
Abc $10.00 11 2006/3/10
Def $5.00 9 2005/5/2
SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL;
2006-07-27 OSCON 2006 30
Top three per group
n Problem: list the largest three cities per US state.
SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC;
n I.e., show the cities for which the number of cities with the same state and greater population is less than or equal to three.
2006-07-27 OSCON 2006 31
Top three per group
Cities c2 state city_name population
CA Los Angeles 3485K
CA San Diego 1110K
CA San Jose 782K
CA San Francisco 724K
Cities c state city_name population
CA Los Angeles 3485K
CA San Diego 1110K
CA San Jose 782K
CA San Francisco 724K
2006-07-27 OSCON 2006 32
Top three per group
Query result set state city_name population
CA Los Angeles 3485K
CA San Diego 1110K
CA San Jose 782K
SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC;
2006-07-27 OSCON 2006 33
Fetching EAV attributes
n Entity-Attribute-Value table structure for dynamic attributes n Not normalized schema design n Lacks integrity enforcement n Not scalable n Nevertheless, EAV is used widely and is
sometimes the only solution when attributes evolve quickly
2006-07-27 OSCON 2006 34
Fetching EAV attributes
Attributes product_id attribute value
Abc Media DVD
Abc Discs 2
Abc Format Widescreen
Abc Length 108 min.
Products product_id
Abc
Def
Efg
2006-07-27 OSCON 2006 35
Fetching EAV attributes
n Need an outer join per attribute: SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc';
2006-07-27 OSCON 2006 36
Fetching EAV attributes
Query result set product_id media discs Format length
Abc DVD 2 Widescreen 108 min.
SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc';
2006-07-27 OSCON 2006 37
7 2 6 9
3
2
6
7
1 9
3 1 6 7
Sudoku puzzles
3 5 1 1 4 7 6
8 5 9 4 2 2 3 1
5 3 6 9 8 6 4 2
5 1 2 8 6 7 5 9
7 3 1
2006-07-27 OSCON 2006 38
Sudoku schema CREATE TABLE one_to_nine (
value INTEGER NOT NULL ); INSERT INTO one_to_nine (value) VALUES
(1), (2), (3), (4), (5), (6), (7), (8), (9); CREATE TABLE sudoku (
column INTEGER NOT NULL, row INTEGER NOT NULL, value INTEGER NOT NULL );
INSERT INTO sudoku (column, row, value) VALUES (6,1,3), (8,1,5), (9,1,1), (1,2,1), (2,2,4), (5,2,7), (7,2,6), (2,3,8), (3,3,5), (4,3,9), (7,3,4), (9,3,2), (3,4,2), (4,4,3), (7,4,1), (9,4,7), (1,5,5), (2,5,3), (8,5,6), (1,6,9), (4,6,8), (5,6,6), (6,6,4), (8,6,2), (2,7,5), (4,7,1), (6,7,2), (8,7,8), (1,8,6), (3,8,7), (4,8,5), (8,8,9), (6,9,7), (7,9,3), (8,9,1);
2006-07-27 OSCON 2006 39
Showing puzzle state SELECT GROUP_CONCAT(COALESCE(s.value, '_') ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state` FROM one_to_nine AS x INNER JOIN one_to_nine AS y LEFT OUTER JOIN sudoku AS s ON s.column = x.value AND s.row = y.value GROUP BY y.value;
+-------------------+ | Puzzle_state | +-------------------+ | _ _ _ _ _ 3 _ 5 1 | | 1 4 _ _ 7 _ 6 _ _ | | _ 8 5 9 _ _ 4 _ 2 | | _ _ 2 3 _ _ 1 _ 7 | | 5 3 _ _ _ _ _ 6 _ | | 9 _ _ 8 6 4 _ 2 _ | | _ 5 _ 1 _ 2 _ 8 _ | | 6 _ 7 5 _ _ _ 9 _ | | _ _ _ _ _ 7 3 1 _ | +-------------------+
2006-07-27 OSCON 2006 40
Revealing possible values SELECT x_loop.value AS x, y_loop.value AS y, GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value
Is there any value already in the cell x, y ?
Does the value appear in column x ?
Does the value appear in row y ?
Does the value appear in the sub-square containing x, y ?
Select for cases where all four outer joins find
no matches
Cartesian product: loop x over 1..9 columns,
loop y over 1..9 rows, loop cell over 1..9 values
2006-07-27 OSCON 2006 41
Revealing singleton values SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1;
Limit the groups only to those with one value
remaining
2006-07-27 OSCON 2006 42
Updating the puzzle INSERT INTO sudoku (column, row, value)
SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1;
Insert these singletons back into the table,
then we can try again
2006-07-27 OSCON 2006 43
Finish
n Outer joins are an indispensable part of SQL programming.
Thank you!