Bill KarwinSoftware developer, consultant, trainer
Using MySQL since 2000
Senior Database Architect at SchoolMessenger
Author of SQL Antipatterns: Avoiding the Pitfalls of Database Programming
Oracle ACE Director
How to Query a Tree?Hierarchical data§ Organization charts§ Categories and sub-categories§ Parts explosion§ Threaded discussions
https://commons.wikimedia.org/wiki/File:Staff_Organisation_Diagram,_1896.jpg
Adjacency List Example Datacomment_id parent_id author comment1 NULL Fran What’s the cause of this bug?2 1 Ollie I think it’s a null pointer.3 2 Fran No, I checked for that.4 1 Kukla We need to check valid input.5 4 Ollie Yes, that’s a bug.6 4 Fran Yes, please add a check7 6 Kukla That fixed it.
Can’t Easily Query Deep TreesSELECT * FROM Comments c1LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id) LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)...
MySQL WorkaroundsMySQL lacked support for recursive queries, so workarounds were needed
These are all denormalized designs, most don’t have referential integrity
§Path enumeration§Nested sets§Closure table
Path Enumeration Example Datacomment_id path author comment1 1/ Fran What’s the cause of this bug?2 1/2/ Ollie I think it’s a null pointer.3 1/2/3/ Fran No, I checked for that.4 1/4/ Kukla We need to check valid input.5 1/4/5/ Ollie Yes, that’s a bug.6 1/4/6/ Fran Yes, please add a check7 1/4/6/7/ Kukla That fixed it.
Path Enumeration Example QueriesQuery ancestors of comment #7:
SELECT * FROM CommentsWHERE '1/4/6/7/' LIKE CONCAT(path, '%');
Query descendants of comment #4:
SELECT * FROM CommentsWHERE path LIKE '1/4/%';
Path Enumeration Pros and ConsPros:§Single non-recursive query to get a tree or a subtree
Cons:§Complex updates to add or remove a node§Numbers are stored in a string—no referential integrity
Nested SetsEach comment encodes its descendants using two numbers:§ A comment’s left number is less than all numbers used by the comment’s descendants.§ A comment’s right number is greater than all numbers used by the comment’s
descendants.§ A comment’s numbers are between all
numbers used by the comment’s ancestors.
References:§ “Recursive Hierarchies: The Relational Taboo!” Michael J. Kamfonas,
Relational Journal, Oct/Nov 1992§ “Trees and Hierarchies in SQL For Smarties,” Joe Celko, 2004§ “Managing Hierarchical Data in MySQL,” Mike Hillyer, 2005
Nested Sets Example Datacomment_id nsleft nsright author comment1 1 14 Fran What’s the cause of this bug?2 2 5 Ollie I think it’s a null pointer.3 3 4 Fran No, I checked for that.4 6 13 Kukla We need to check valid input.5 7 8 Ollie Yes, that’s a bug.6 9 12 Fran Yes, please add a check7 10 11 Kukla That fixed it.
Nested Sets Example QueriesQuery ancestors of comment #7:
SELECT ancestor.* FROM Comments child JOIN Comments ancestor ON child.nsleft BETWEEN ancestor.nsleft AND ancestor.nsright
WHERE child.comment_id = 7;
Query subtree under comment #4:
SELECT descendant.* FROM Comments parent JOIN Comments descendant ON descendant.nsleft BETWEEN parent.nsleft AND parent.nsright
WHERE parent.comment_id = 4;
Nested Sets Pros and ConsPros:§Single non-recursive query to get a tree or a subtree
Cons:§Complex updates to add or remove a node§Numbers are not foreign keys—no referential integrity
Closure TableMany-to-many table
Stores every path from each node to each of its descendants
A node even connects to itself
CREATE TABLE Closure (ancestor INT NOT NULL,descendant INT NOT NULL,length INT NOT NULL,PRIMARY KEY (ancestor, descendant),FOREIGN KEY(ancestor) REFERENCES Comments(comment_id),FOREIGN KEY(descendant) REFERENCES Comments(comment_id)
);
Closure Table Example Datacomment_id author comment
1 Fran What’s the cause of this bug?2 Ollie I think it’s a null pointer.3 Fran No, I checked for that.4 Kukla We need to check valid input.5 Ollie Yes, that’s a bug.6 Fran Yes, please add a check7 Kukla That fixed it.
ancestor descendant length
1 1 0
1 2 1
1 3 2
1 4 1
1 5 2
1 6 2
1 7 3
2 2 0
2 3 1
3 3 0
4 4 0
4 5 1
4 6 1
4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
Closure Table Example QueriesQuery ancestors of comment #7:
SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.ancestor)
WHERE t.descendant = 7;
Query subtree under comment #4:
SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.descendant)
WHERE t.ancestor = 4;
Closure Table Pros and ConsPros:§Single non-recursive query to get a tree or a subtree§Referential integrity!
Cons:§Extra table is required§Hierarchy is stored redundantly, too easy to mess up§Lots of joins to do most kinds of queries
WITHer Recursive Queries in MySQL?SQL vendors gradually implemented SQL-99 WITH syntax: § IBM DB2 UDB 8 (Dec. 2002)§ Microsoft SQL Server 2005 (Oct. 2005)§ Sybase SQL Anywhere 11 (Aug. 2008)§ Firebird 2.1 (Sep. 2008)§ PostgreSQL 8.4 (Jul. 2009)§ Oracle 11g release 2 (Sep. 2009)§ Teradata (date and version of support unknown, at least 2009)§ HSQLDB 2.3 (Jul. 2013)§ SQLite 3.8.3.1 (Feb. 2014)§ H2 (date and version unknown)
https://www.percona.com/blog/2014/02/11/wither-recursive-queries/
ANSI SQL Recursive Common Table ExpressionWITH RECURSIVE cte_name (col_name, col_name, col_name) AS(
subquery base case
UNION ALL
subquery referencing cte_name
)
SELECT ... FROM cte_name ...
https://dev.mysql.com/doc/refman/8.0/en/with.html
Generating a Series of NumbersWITH RECURSIVE MySeries (n) AS(
SELECT 1 AS n
UNION ALL
SELECT 1+n FROM MySeries WHERE n < 10
)
SELECT * FROM MySeries;
+------+| n |+------+| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 |+------+
Generating a Series of DatesWITH RECURSIVE MyDates (d) AS(
SELECT CURRENT_DATE() AS d
UNION ALL
SELECT d + INTERVAL 1 DAY FROM MyDatesWHERE d < CURRENT_DATE() + INTERVAL 7 DAY
)
SELECT * FROM MyDates;
+------------+| d |+------------+| 2017-04-24 || 2017-04-25 || 2017-04-26 || 2017-04-27 || 2017-04-28 || 2017-04-29 || 2017-04-30 || 2017-05-01 |+------------+
Query ancestors of comment #7WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depthFROM CommentsWHERE comment_id = 7
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1FROM CommentTree ctJOIN Comments c ON (ct.parent_id = c.comment_id)
)
SELECT * FROM CommentTree;
Query subtree under comment #4WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depthFROM CommentsWHERE comment_id = 4
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1FROM CommentTree ctJOIN Comments c ON (ct.comment_id = c.parent_id)
)
SELECT * FROM CommentTree;
Recursive CTE Pros and ConsPros:§ ANSI SQL-99 Standard§ Compatible with other SQL implementations§ Works with Adjacency List (single source of authority)§ Referential integrity!
Cons:§ Not compatible with earlier MySQL versions§ Use of materialized temporary tables may cause performance problems
MySQL CTE Implementation: 💯
Thanksto@MarkusWinand forhispreviewanalysisbasedon8.0.1-dmrhttp://modern-sql.com/feature/with
ITIS: Sample Hierarchical DataIntegrated Taxonomic Information System (https://www.itis.gov/)§Biological database of species of animals, plants, fungi§One big tree of 544,954 nodes§Data comes in adjacency list & path enumeration format§I converted to closure table for query tests
ITIS Data Modelmysql> select * from longnames
where completename = 'Eschscholzia californica';+--------+---------------------------+| tsn | completename |+--------+---------------------------+| 18956 | Eschscholzia californica |+--------+---------------------------+
mysql> select * from hierarchy where TSN = '18956'\GTSN: 18956
Parent_TSN: 18954level: 11
ChildrenCount: 8hierarchy_string: 202422-954898-846494-954900-846496-846504-18063-846547-18409-18880-18954-18956
Breadcrumbs QueryWITH RECURSIVE taxonomy AS (SELECT base.tsn, base.parent_tsn, 0 as depth FROM hierarchy base WHERE tsn = '18956'UNION ALL
SELECT next.tsn, next.parent_tsn, t.depth+1 FROM hierarchy next JOIN taxonomy tWHERE t.parent_tsn = next.tsn
)SELECT * FROM taxonomy JOIN longnames USING (tsn)ORDER BY depth DESC;
Breadcrumbs Query Result+--------+------------+-------+--------------------------+| tsn | parent_tsn | depth | completename |+--------+------------+-------+--------------------------+| 202422 | 0 | 11 | Plantae || 954898 | 202422 | 10 | Viridiplantae || 846494 | 954898 | 9 | Streptophyta || 954900 | 846494 | 8 | Embryophyta || 846496 | 954900 | 7 | Tracheophyta || 846504 | 846496 | 6 | Spermatophytina || 18063 | 846504 | 5 | Magnoliopsida || 846547 | 18063 | 4 | Ranunculanae || 18409 | 846547 | 3 | Ranunculales || 18880 | 18409 | 2 | Papaveraceae || 18954 | 18880 | 1 | Eschscholzia || 18956 | 18954 | 0 | Eschscholzia californica |+--------+------------+-------+--------------------------+12 rows in set (0.00 sec)
Breadcrumbs Query EXPLAIN Plan
§New note in Extra: "Recursive"
§Using index (covering index) for both base case and recursive case
§I can eliminate the filesort if I allow natural order (base case first)
§No "Using Temporary"? Not so fast…
+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort || 1 | PRIMARY | longnames | eq_ref | PRIMARY,tsn | PRIMARY | 4 | taxonomy.tsn | 1 | 100.00 | NULL || 2 | DERIVED | base | ref | TSN | TSN | 4 | const | 1 | 100.00 | Using index || 3 | UNION | t | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where || 3 | UNION | next | ref | TSN | TSN | 4 | t.parent_tsn | 1 | 100.00 | Using index |+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+
Breadcrumbs Query Performancemysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLES\G
query: WITH RECURSIVE `taxonomy` AS ( ... `tsn` ) ORDER BY `depth` DESC
db: itisexec_count: 1
total_latency: 10.05 msmemory_tmp_tables: 1disk_tmp_tables: 0
avg_tmp_tables_per_query: 1tmp_tables_to_disk_pct: 0
first_seen: 2017-04-24 22:07:56last_seen: 2017-04-24 22:07:56
digest: 8438633360bedce178823bb868589fd0
Breadcrumbs Query Stagesmysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;
+------+--------------------------------+-------+---------------+-------------+| user | event_name | total | total_latency | avg_latency |+------+--------------------------------+-------+---------------+-------------+| root | stage/sql/System lock | 40 | 6.62 ms | 165.60 us || root | stage/sql/Opening tables | 191 | 3.16 ms | 16.52 us || root | stage/sql/checking permissions | 45 | 1.50 ms | 33.44 us || root | stage/sql/Creating sort index | 1 | 239.63 us | 239.63 us || root | stage/sql/closing tables | 191 | 191.03 us | 1.00 us || root | stage/sql/starting | 2 | 188.44 us | 94.22 us || root | stage/sql/Sending data | 6 | 138.96 us | 23.16 us || root | stage/sql/statistics | 4 | 122.42 us | 30.60 us || root | stage/sql/query end | 191 | 56.67 us | 296.00 ns || root | stage/sql/preparing | 4 | 33.57 us | 8.39 us || root | stage/sql/freeing items | 2 | 27.93 us | 13.96 us || root | stage/sql/optimizing | 5 | 20.03 us | 4.01 us || root | stage/sql/executing | 7 | 15.39 us | 2.20 us |
| root | stage/sql/removing tmp table | 4 | 9.35 us | 2.34 us |
| root | stage/sql/init | 3 | 8.76 us | 2.92 us || root | stage/sql/Sorting result | 2 | 4.16 us | 2.08 us || root | stage/sql/end | 3 | 1.93 us | 644.00 ns || root | stage/sql/cleaning up | 2 | 1.43 us | 715.00 ns |+------+--------------------------------+-------+---------------+-------------+
Tree Expansion QueryWITH RECURSIVE ancestors (tsn, parent_tsn) AS (
SELECT h.tsn, h.parent_tsn FROM hierarchy AS h WHERE h.tsn = %sUNION ALLSELECT h.tsn, h.parent_tsn FROM hierarchy AS h JOIN ancestors AS base ON h.tsn = base.parent_tsn
),breadcrumbs (tsn, parent_tsn, depth, breadcrumbs) AS (
SELECT h.tsn, h.parent_tsn, 0 AS depth, CAST(LPAD(h.tsn, 8, '0') AS CHAR(255)) AS breadcrumbsFROM hierarchy AS h WHERE h.parent_tsn = 0UNION ALLSELECT h.tsn, h.parent_tsn, base.depth+1 AS depth, CONCAT(base.breadcrumbs, ',', LPAD(h.tsn, 8,
'0'))FROM hierarchy AS hJOIN ancestors AS a ON h.tsn = a.tsnJOIN breadcrumbs AS base ON h.parent_tsn = base.tsn
)SELECT l.tsn, l.completename, b.depth, b.breadcrumbsFROM breadcrumbs AS b JOIN longnames AS l ON b.tsn = l.tsnUNIONSELECT l.tsn, l.completename, b.depth+1, CONCAT(b.breadcrumbs, ',', LPAD(h.tsn, 8, '0'))FROM breadcrumbs AS bJOIN hierarchy AS h ON b.tsn = h.parent_tsnJOIN longnames AS l ON l.tsn = h.tsnORDER BY breadcrumbs
Tree Expansion Query EXPLAIN+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------
| id | select_type | table | type | key | key_len | ref | rows | filtered | Extra+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 250230 | 100.00 | Using where1 | PRIMARY | l | eq_ref | PRIMARY | 4 | b.tsn | 1 | 100.00 | NULL 2 | DERIVED | h | index | TSN | 9 | NULL | 500466 | 10.00 | Using where; Using index 3 | UNION | base | ALL | NULL | NULL | NULL | 50046 | 100.00 | Recursive; Using where 3 | UNION | <derived4> | ALL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using join buffer (Block Nested Loop) |3 | UNION | h | ref | TSN | 9 | a.tsn,base.tsn | 1 | 100.00 | Using index4 | DERIVED | h | ref | TSN | 4 | const | 1 | 100.00 | Using index5 | UNION | base | ALL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where 5 | UNION | h | ref | TSN | 4 | base.parent_tsn | 1 | 100.00 | Using index8 | UNION | h | index | TSN | 9 | NULL | 500466 | 100.00 | Using where; Using index 8 | UNION | l | eq_ref | PRIMARY | 4 | itis.h.TSN | 1 | 100.00 | NULL 8 | UNION | <derived2> | ref | <auto_key0> | 5 | itis.h.Parent_TSN | 10 | 100.00 | NULL
| NULL | UNION RESULT | <union1,8> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------
Maybe I need more indexes?Unfortunately I ran out of time to analyze.
Tree Expansion Query Performancemysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLES\G
query: WITH RECURSIVE `ancestors` ( ` ... `l` . `completename` , `b` .
db: itisexec_count: 1
total_latency: 1.24 smemory_tmp_tables: 3disk_tmp_tables: 0
avg_tmp_tables_per_query: 3tmp_tables_to_disk_pct: 0
first_seen: 2017-04-27 01:33:14last_seen: 2017-04-27 01:33:14
digest: 86c1417d2ff3679863db754eff425e94
Tree Expansion Query Stagesmysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;
+------+--------------------------------+-------+---------------+-------------+| user | event_name | total | total_latency | avg_latency |+------+--------------------------------+-------+---------------+-------------+
| root | stage/sql/Sending data | 12 | 979.42 ms | 81.62 ms |
| root | stage/sql/System lock | 40 | 6.34 ms | 158.52 us || root | stage/sql/Opening tables | 191 | 3.34 ms | 17.51 us || root | stage/sql/checking permissions | 53 | 1.35 ms | 25.45 us || root | stage/sql/starting | 2 | 356.31 us | 178.16 us || root | stage/sql/statistics | 12 | 271.01 us | 22.58 us || root | stage/sql/closing tables | 191 | 179.15 us | 937.00 ns || root | stage/sql/preparing | 12 | 98.18 us | 8.18 us || root | stage/sql/query end | 191 | 57.60 us | 301.00 ns || root | stage/sql/freeing items | 2 | 47.93 us | 23.96 us || root | stage/sql/Creating sort index | 1 | 37.38 us | 37.38 us || root | stage/sql/optimizing | 13 | 30.60 us | 2.35 us || root | stage/sql/executing | 13 | 30.27 us | 2.33 us || root | stage/sql/removing tmp table | 14 | 24.44 us | 1.74 us || root | stage/sql/init | 3 | 14.78 us | 4.93 us || root | stage/sql/cleaning up | 2 | 11.66 us | 5.83 us || root | stage/sql/Sorting result | 2 | 3.67 us | 1.84 us || root | stage/sql/end | 3 | 3.04 us | 1.01 us |+------+--------------------------------+-------+---------------+-------------+
Conclusions§Overall, MySQL 8 support for recursive CTE queries is worth the wait.
§Exotic cases exist that are beyond any optimizer.§I'm excited to upgrade to MySQL 8.0.x ASAP!
§Now that virtually all major SQL brands support recursive CTE's, we need developer tools and popular apps to use them!
License and CopyrightCopyright 2017 Bill Karwin
http://www.slideshare.net/billkarwinReleased under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share—to copy, distribute, and transmit this work, under the following conditions:
Attribution.You must attribute this
work to Bill Karwin.
Noncommercial.You may not use this work for commercial
purposes.
No Derivative Works.You may not alter, transform, or build
upon this work.