Using MySQL Common Table Expressions and Window Functions ...€¦ · Common table expressions can be divided into two categories: non-recursive and recursive. Non-recursive common

Using MySQL Common Table Expressions and Window Functions [HOL2986]

Jesper Wisborg Krogh

Senior Principal Technical Support Engineer

Prerequisites

Note: This is not required when using the laptops at HOL2986 at Oracle OpenWorld

2018.

It is assumed the following software is already installed on the machine where you try

the examples in this workbook. If you are attending the hands-on labs session Using

MySQL Common Table Expressions and Window Functions [HOL2986] at Oracle

OpenWorld 2018, you do not need to do anything, and you can skip to the next section.

Software List

MySQL Server 8.0.12 or later

Recommended: MySQL Workbench 8.0.12 or later

Data List

The sakila database: http://downloads.mysql.com/docs/sakila-db.zip - for more

information about the sakila database, see also:

https://dev.mysql.com/doc/sakila/en/.

HOL Information

The following information is useful for the hands-on lab session at Oracle OpenWorld

2018:

Windows Username lab

Windows Password HandsOn18!

MySQL Root Username root@localhost

MySQL Root Password S3cr3t

MySQL Regular User Username oracle@localhost

MySQL Regular User Password Oracle2018

MySQL Schemas sakila

Tip: You can connect to MySQL using the pre-configured connection in MySQL

Workbench.

Common Table Expressions (CTEs)

A common table expression (also commonly known as a CTE) is a named subquery. The

common table expression is defined before the main query and can then be used as if it

http://downloads.mysql.com/docs/sakila-db.zip

https://dev.mysql.com/doc/sakila/en/

was a regular table. It is possible to reference a common table expression more than

once (unlike traditional MySQL subqueries) and a query can have several common table

expression. Additionally, there are in some cases a performance gain.

Common table expressions can be divided into two categories: non-recursive and

recursive. Non-recursive common table expressions are the simplest, so let’s start out

looking at those.

Non-Recursive

In the simple non-recursive form, a common table expression simply specify a subquery,

so you can reference it by name as it is a regular table inside the query. A common table

expression is defined using the WITH keyword (specify only WITH once even if you define

multiple common table expressions).

Basic Construct

As a simple example, consider a common table expression selecting the integer 1 as the

column i and making it available in the main part of the query through the name cte:

WITH cte(i) AS (SELECT 1)

SELECT * FROM cte;

Script: cte_basic.sql

The query consists of several parts:

WITH tells that you are defining a common table expression.

cte is the name of the expression and it has one column named i.

AS tells that the subquery used for the cte expression will be defined next

(between the parentheses).

Finally the main part of the query uses the cte common table expression as the

source of its data.

The output of the query is:

+---+

| i |

+---+

| 1 |

+---+

Granted, this example does not really show why you want to use common table

expressions, but it still displays all the parts of a non-recursive common table

expression. You can use this basic construct to create more complex examples.

Multiple Common Table Expressions in One Query

You can have multiple common table expressions in one query. You separate the

expressions using a comma. The expressions are processed in the order listed, so a given

expression can reference the expressions defined earlier. For example:

WITH cte_1(txt) AS (SELECT 'Hello'),

cte_2(txt) AS (SELECT CONCAT_WS(' ', cte_1.txt, 'World')

FROM cte_1)

SELECT txt FROM cte_2;

Script: cte_multiple.sql

Yes, this is a cumbersome way to create the string “Hello World”, but in some real world

queries, it can greatly simplify complex queries and can give a more linear workflow that

makes it easier to understand what the query does.

Recursive Common Table Expressions

While the non-recursive common table expressions are very useful as just shown, their

use is relatively limited. There is however also the possibility to use recursive

expressions. This is done by defining a seed query and use a UNION to define how

further rows are generated. The second part of the UNION refers to the common table

expression itself to read the previous generated row(s).

Sequences

The most basic recursive common table expression is to generate a sequence, for

example:

WITH RECURSIVE digits(i) AS

(SELECT 1

UNION

SELECT i+1

FROM digits

WHERE i < 10

)

SELECT i

FROM digits

ORDER BY i;

Script: cte_recursive_digits.sql

The recursion terminates when no rows are created, i.e. when the WHERE clause does

not match any of the previous generated rows. So, the query generates the digits 1

through 10 (the WHERE clause applies to the previous generated rows):

+----+

| i |

+----+

| 1 |

| 2 |

| 3 |

| 4 |

| 5 |

| 6 |

| 7 |

| 8 |

| 9 |

| 10 |

+----+

Tip: Both the seed and recursive parts can themselves include UNION.

This may not seem very useful at all, but now consider a sales query. Some

days/weeks/months may not have any sales data, so what do you do if you want to

include those days/weeks/months in the report? There is an example later in the lab

where this will be demonstrated; here let’s consider how to generate the dates for

Oracle OpenWorld 2018:

WITH RECURSIVE oow_days(DayNumber, Day) AS

(SELECT 0, '2018-10-21'

UNION

SELECT DayNumber + 1, `Day` + INTERVAL 1 DAY

FROM oow_days

WHERE `Day` < '2018-10-25'

)

SELECT DayNumber, DAYNAME(`Day`) AS DayName, `Day`

FROM oow_days

ORDER BY `Day`;

Script: cte_recursive_days.sql

The result of the query is:

+-----------+-----------+------------+

| DayNumber | DayName | Day |

+-----------+-----------+------------+

| 0 | Sunday | 2018-10-21 |

| 1 | Monday | 2018-10-22 |

| 2 | Tuesday | 2018-10-23 |

| 3 | Wednesday | 2018-10-24 |

| 4 | Thursday | 2018-10-25 |

+-----------+-----------+------------+

Terminating Recursion

What happens if you forget to add a WHERE clause in the recursive part of the common

table expression? That depends. When the common table expression generates a row

that already exists in the result set, the recursion will be default stop automatically. So,

if your common table expression naturally terminates, there is not any problems. An

example is:

WITH RECURSIVE months(MonthNumber, MonthName) AS

(SELECT MONTH('2018-01-01'), MONTHNAME('2018-01-01')

UNION

SELECT MONTH('2018-01-01' + INTERVAL MonthNumber MONTH),

MONTHNAME('2018-01-01' + INTERVAL MonthNumber MONTH)

FROM months

)

SELECT *

FROM Months

ORDER BY MonthNumber;

Script: cte_recursive_months.sql

(Yes, that is a roundabout way to generate the names of the months. It is left as an

exercise to generate the list is a more straight forward way.)

The result of the query is just 12 rows with the month names:

+-------------+-----------+

| MonthNumber | MonthName |

+-------------+-----------+

| 1 | January |

| 2 | February |

| 3 | March |

| 4 | April |

| 5 | May |

| 6 | June |

| 7 | July |

| 8 | August |

| 9 | September |

| 10 | October |

| 11 | November |

| 12 | December |

+-------------+-----------+

However, in general recursive common table expression do not terminate on their own.

Consider the original sequence of digits but without the WHERE clause:

WITH RECURSIVE digits(i) AS

(SELECT 1

UNION

SELECT i+1

FROM digits

)

SELECT i

FROM digits

ORDER BY i;

Script: cte_recursive_digits_infinite.sql

This query fails with an error 3636:

ERROR: 3636: Recursive query aborted after 1001 iterations. Try

increasing @@cte_max_recursion_depth to a larger value.

MySQL will for safety fail with an error after 1000 rows have been generated. This is to

protect you from queries running indefinitely (and requiring indefinite resources to

store the result) in case you forget to add a WHERE clause that ensures the recursion is

terminated and the recursion does not terminate naturally. The limit can be changed

using the cte_max_recursion_depth option.

You have three options, if you need your expression to generate more than 1000 rows:

Increase cte_max_recursion_depth globally for all connections.

Increase cte_max_recursion_depth for the session, i.e. for that one

connection.

Increase cte_max_recursion_depth just for that one query.

Tip: If you need to generate more than 1000 rows in a recursive common table

expression, remember to increase the value of the cte_max_recursion_depth option.

This can be set at the session level, so it only affects the one session, or it can be set as

an optimizer hint to affect only that one query. It is recommended to increase the value

at as limited scope as possible.

As an example, consider a case where you need all days in a three year period. This will

generate 1095 or 1096 (depending on whether a leap year is included) rows. Since it is

expected that more than 1000 rows are generated, you can use an optimizer hint to set

the value of cte_max_recursion_depth to for example 1100:

WITH RECURSIVE days(DayDate) AS

(SELECT /*+ SET_VAR(cte_max_recursion_depth = 1100) */

'2015-01-01'

UNION

SELECT DayDate + INTERVAL 1 DAY

FROM days

WHERE DayDate < '2017-12-31'

)

SELECT *

FROM days

ORDER BY DayDate;

Script: cte_recursive_days_3_years.sql

This far the data type of the data generated by the common table expressions have

been ignored. How does that work?

Common Table Expression and Data Types

The data type of each column in the generated tables is determined by the data type of

the data that is fetched. For recursive common table expressions, it is the seed query

that determines the data type. A simple way to demonstrate this is to use a recursive

expression where the seed query generates a string and then in the recursive part to

concatenate to it:

WITH RECURSIVE x(i, str) AS

(SELECT 1, 'abc'

UNION

SELECT i+1, CONCAT(str, 'def')

FROM x

WHERE i < 2

)

SELECT *

FROM x;

Script: cte_data_type.sql

Executing this query causes an error:

ERROR: 1406: Data too long for column 'abc' at row 1

The seed query generates the string “abc” causing the data type to be CHAR(3). When

the recursive part of the query tries to concatenate “def”, there is no room for the extra

characters, and the query fails (provided strict mode is enabled; otherwise the same

string is generated again terminating the recursion and giving the wrong result).

To avoid that issue, explicitly use CAST() in the seed query to get the required data

type:

WITH RECURSIVE x(i, str) AS

(SELECT 1, CAST('abc' AS CHAR(6))

UNION

SELECT i+1, CONCAT(str, 'def')

FROM x

WHERE i < 2

)

SELECT *

FROM x;

Script: cte_data_type_fixed.sql

Common table expressions can also be used with tables. Let’s look at that.

Querying Tables

The sakila database includes the film table with information about the films that can

be rented. There is also the actor table with actors and a join table, film_actor, so it is

possible to determine which actors are in which movies.

Let’s say you want to find all movies starring the actor Sidney Crowe (the movies and

actors included are not from the real world) and not only list the film id and title, but

also all of the actors included. This sounds like a simple query, but it quickly becomes

complex unless you break it up. A recursive common table expression is a nice way to

break it up into parts:

WITH RECURSIVE films (film_id, title, first_name, last_name) AS

( -- what movies did Sidney Crowe work in

SELECT f.film_id, f.title, a.first_name, a.last_name

FROM sakila.film f

INNER JOIN sakila.film_actor fa USING (film_id)

INNER JOIN sakila.actor a USING (actor_id)

WHERE a.first_name = 'SIDNEY' AND a.last_name='CROWE'

UNION -- What movies did they then work in

SELECT f.film_id, f.title, a.first_name, a.last_name

FROM films f

INNER JOIN sakila.film_actor fa USING (film_id)

INNER JOIN sakila.actor a USING (actor_id)

)

SELECT film_id, title,

GROUP_CONCAT(

CONCAT(first_name, ' ', last_name)

ORDER BY last_name, first_name

SEPARATOR ', '

) AS Actors

FROM films

GROUP BY film_id, title;

Script: cte_recursive_actors.sql

The seed part of the common table expression finds all movies starring Sidney Crowe.

The recursive part finds all of the other actors in each of the movies. This is an example

where the seed query generates more than one row – that is perfectly fine; the

recursive query will apply to all of the rows found by the seed query. There is no need

for a WHERE clause in this case as the second iteration will generate the same rows as the

first iteration and thus cause the recursion to terminate as by default UNION DISTINCT

is used.

With the films and actors available in the films common table expression, it is easy to

aggregate the result. The query returns 34 films of which the two first and the last are

(the vertical output format has been used here due to the limitation of the page width):

*************************** 1. row ***************************

film_id: 12

title: ALASKA PHANTOM

Actors: VAL BOLGER, SIDNEY CROWE, SYLVESTER DERN, ALBERT JOHANSSON,

GENE MCKELLEN, BURT POSEY, JEFF SILVERSTONE

*************************** 2. row ***************************

film_id: 15

title: ALIEN CENTER

Actors: SIDNEY CROWE, BURT DUKAKIS, MENA HOPPER, KENNETH PALTROW,

RENEE TRACY, HUMPHREY WILLIS

…

*************************** 34. row ***************************

film_id: 994

title: WYOMING STORM

Actors: HARRISON BALE, SIDNEY CROWE, WOODY HOFFMAN, PENELOPE MONROE,

BETTE NICHOLSON, JULIA ZELLWEGER

Feel free to play more with common table expressions. There will be more examples

while discussing window functions. For now, let’s take a look at window functions.

Window Functions

Window functions are analytical functions that can be used to obtain derived values

from the data in a query. You can obtain all of this by handling the result in your

application, however by using window functions you can keep the logic of the query and

analysis together. Choose which works best for your separation between database and

application logic.

Window functions are related to aggregate functions. The difference is that where an

aggregate function works on the total result (using the fields in the GROUP BY clause to

know how to group the results of the functions), a window function works on a subset

of rows (called the window frame).

There are a number of functions specific for use as window functions (see also

https://dev.mysql.com/doc/refman/en/window-function-descriptions.html):

CUME_DIST(): Cumulative distribution value

DENSE_RANK(): Rank of current row within its partition, without gaps

FIRST_VALUE(): Value of argument from first row of window frame

LAG(): Value of argument from row lagging current row within partition

LAST_VALUE(): Value of argument from last row of window frame

LEAD(): Value of argument from row leading current row within partition

NTH_VALUE(): Value of argument from N-th row of window frame

NTILE(): Bucket number of current row within its partition

PERCENT_RANK(): Percentage rank value

RANK(): Rank of current row within its partition, with gaps

ROW_NUMBER(): Number of current row within its partition

https://dev.mysql.com/doc/refman/en/window-function-descriptions.html

Additionally, several of the standard aggregate functions can be used (see

https://dev.mysql.com/doc/refman/en/group-by-functions.html for details):

AVG(): Return the average value of the argument

BIT_AND(): Return bitwise AND

BIT_OR(): Return bitwise OR

BIT_XOR(): Return bitwise XOR

COUNT(): Return a count of the number of rows returned

MAX(): Return the maximum value

MIN(): Return the minimum value

STDDEV_POP(), STDDEV(), STD(): Return the population standard deviation

STDDEV_SAMP(): Return the sample standard deviation

SUM(): Return the sum

VAR_POP(), VARIANCE(): Return the population standard variance

VAR_SAMP(): Return the sample variance

In the execution order, window functions are evaluated after HAVING and before ORDER

BY. A window function can be included as a field in ORDER BY but not in GROUP BY.

Syntax

There are two different ways to use window function (full details in

https://dev.mysql.com/doc/refman/en/window-functions.html):

Inline:

SELECT …

function() OVER [partition_clause]

[order_clause]

[frame_clause]

…

Named windows:

SELECT …

function() OVER window_name

…

WINDOW window_name AS ([partition_clause]

[order_clause]

[frame_clause]) Named windows allows you to re-use a window definition which can make it

easier to understand what the query is doing.

There are three parts to the window definition:

partition_clause: What to partition by – this is similar to a GROUP BY clause.

https://dev.mysql.com/doc/refman/en/group-by-functions.html

https://dev.mysql.com/doc/refman/en/window-functions.html

order_clause: What to order by – this is similar to an ORDER BY clause.

frame_clause: What the frame consists of; essentially the number of rows to

include for the window frame before and after the current row. For windows

with an order clause, the default frame is all rows previous in the partition;

without an order clause the default frame is all rows in the partition.

Not all parts have a meaning for all functions. For example, the ROW_NUMBER() function

will always apply to a single row irrespective of the frame clause.

As a trivial example to show the difference between the two syntaxes, consider the

following example where the cities in France and United Kingdom are found. The inline

version of the query is:

SELECT city_id, city, country_id,

ROW_NUMBER() OVER(PARTITION BY country_id) AS RowNumber

FROM sakila.city

WHERE country_id IN (34, 102)

ORDER BY city;

Script: window_country_inline.sql

The version using a named window is:

SELECT city_id, city, country_id,

ROW_NUMBER() OVER w_country AS RowNumber

FROM sakila.city

WHERE country_id IN (34, 102)

WINDOW w_country AS (PARTITION BY country_id)

ORDER BY city;

Script: window_country_named.sql

These two queries do the same thing, so you can choose the syntax that suits you the

best, but remember that a named window can be used multiple times in the same

query. The result of the queries is:

+---------+-----------------+------------+-----------+

| city_id | city | country_id | RowNumber |

+---------+-----------------+------------+-----------+

| 88 | Bradford | 102 | 1 |

| 92 | Brest | 34 | 1 |

| 149 | Dundee | 102 | 2 |

| 297 | Le Mans | 34 | 2 |

| 312 | London | 102 | 3 |

| 494 | Southampton | 102 | 4 |

| 495 | Southend-on-Sea | 102 | 5 |

| 496 | Southport | 102 | 6 |

| 500 | Stockport | 102 | 7 |

| 543 | Toulon | 34 | 3 |

| 544 | Toulouse | 34 | 4 |

| 589 | York | 102 | 8 |

+---------+-----------------+------------+-----------+

Introductory Examples

As an example exploring window functions some more consider the following example.

The recursive common table expression is used to create 10 rows where val1 for the

first five rows (id = 1-5) is “abc” and for the last five rows (id = 6-10) is “def”. The val2

column contains a random integer between 0 and 9.

The window functions work at two scopes: global, i.e. across all rows, and partitioned by

val1. The complete example is:

WITH RECURSIVE rand_data(id, val1, val2) AS

(SELECT 1, 'abc', FLOOR(RAND()*10)

UNION

SELECT id+1, IF(id < 5, 'abc', 'def'), FLOOR(RAND()*10)

FROM rand_data

WHERE id < 10

)

SELECT id, val1, val2,

SUM(val2) OVER() AS Total,

SUM(val2) OVER w_all_id AS RunningTotal,

ROUND(AVG(val2) OVER w_all_running, 2) AS RunningAverage,

RANK() OVER w_all_val2 AS 'Rank',

SUM(val2) OVER w_val1_id AS Val1RunningTotal,

ROUND(AVG(val2) OVER w_val1_running, 2) AS Val1RunningAverage,

RANK() OVER w_val1_val2 AS Val1Rank

FROM rand_data

WINDOW w_all_id AS (ORDER BY id),

w_all_val2 AS (ORDER BY val2 DESC),

w_all_running AS (ORDER BY id

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING),

w_val1_id AS (PARTITION BY val1

ORDER BY id

ROWS UNBOUNDED PRECEDING),

w_val1_val2 AS (PARTITION BY val1

ORDER BY val2 DESC

ROWS UNBOUNDED PRECEDING),

w_val1_running AS (PARTITION BY val1

ORDER BY id

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

ORDER BY id;

Script: window_rand_data.sql

For the w_val1_id and w_val1_val2 windows, the ROW UNBOUNDED PRECEDING is not

needed as it is being set to the default (for windows without an ORDER BY the default is

ROW UNBOUNDED PRECEDING ROW UNBOUNDED FOLLOWING), but it is OK to include it

anyway and shows how to specify an unbounded frame.

Try to execute the query and notice the difference between the “global” scope windows

(no PARTITION BY clause) and the windows partitioning by val1. An example out is

(since the data is random, the output will differ for each execution):

The Total column (SUM(val2) OVER() AS Total) is the same you would have gotten if

you did the sum grouping all rows into a single row. The global frames span all rows

whereas the frames partitioned by val1 reset when the value of val1 changes.

The rest of the lab is dedicated to exploring how common table expressions and window

functions can be used.

Sales Query

Non-CTE

A more realistic query is to check sales data per month and calculate the change

compared to the previous month. Sales per month can be found as:

SELECT DATE_FORMAT(r.rental_date, '%Y-%m-01') AS FirstOfMonth,

SUM(p.amount) as Sales

FROM sakila.payment p

INNER JOIN sakila.rental r USING (rental_id)

GROUP BY FirstOfMonth;

Script: monthly_sales.sql

Which gives:

+--------------+----------+

| FirstOfMonth | Sales |

+--------------+----------+

| 2005-05-01 | 4823.44 |

| 2005-06-01 | 9629.89 |

| 2005-07-01 | 28368.91 |

| 2005-08-01 | 24070.14 |

| 2006-02-01 | 514.18 |

+--------------+----------+

By joining on the query itself, it is possible to include the change in sales from the

previous month. In MySQL 5.7 and earlier this could be done using subqueries in the

FROM part, like:

SELECT YEAR(cur.FirstOfMonth) AS 'Year',

MONTHNAME(cur.FirstOfMonth) AS 'Month',

cur.Sales, (cur.Sales - IFNULL(prev.Sales, 0)) AS Delta

FROM (





GROUP BY FirstOfMonth

) cur

LEFT OUTER JOIN (






) prev ON prev.FirstOfMonth = cur.FirstOfMonth - INTERVAL 1

MONTH

ORDER BY cur.FirstOfMonth;

Script: non-cte_monthly_sales.sql

While this gives the correct result, it has some drawbacks:

The logic is duplicated

It is hard to read

The subquery is executed twice meaning execution and storage (of the subquery

result) overhead.

It is error prone – when you change the business logic, you have to remember to

update both subqueries.

A better approach is to use a common table expression.

Using CTE

The subquery in the sales query can be rewritten as a common table expression and

then used twice to iterate over the current and previous months and get the sales for

the current month and the difference to the previous month:

WITH monthly_sales(FirstOfMonth, Sales) AS

(SELECT DATE_FORMAT(r.rental_date, '%Y-%m-01') AS FirstOfMonth,





)

SELECT YEAR(cur.FirstOfMonth) AS 'Year',

MONTHNAME(cur.FirstOfMonth) AS 'Month',

cur.Sales, (cur.Sales - IFNULL(prev.Sales, 0)) AS Delta

FROM monthly_sales cur

LEFT OUTER JOIN monthly_sales prev

ON prev.FirstOfMonth = cur.FirstOfMonth - INTERVAL 1 MONTH

ORDER BY cur.FirstOfMonth;

Script: cte_monthly_sales.sql

Note: This assumes that the sale is 0 in months where there are no sales data.

Without a common table expression, it would have been necessary to write the

subquery twice. Additionally, the common table expression helps separate the various

parts of the query: it becomes clearer how the sales per month is calculated and that

the source data of the current and previous months is the same. So, not only will this

perform better than in-lining the subquery (it’s executed once instead of twice), it is less

error prone, and easier to read.

The result of the query is:

+------+----------+----------+----------+

| Year | Month | Sales | Delta |

+------+----------+----------+----------+

| 2005 | May | 4823.44 | 4823.44 |

| 2005 | June | 9629.89 | 4806.45 |

| 2005 | July | 28368.91 | 18739.02 |

| 2005 | August | 24070.14 | -4298.77 |

| 2006 | February | 514.18 | 514.18 |

+------+----------+----------+----------+

A third possibility is to add the use of a window function.

Using CTE and Window Function

There is a window function that can simplify the query further. The LAG() function

(https://dev.mysql.com/doc/refman/8.0/en/window-function-

descriptions.html#function_lag) takes a value from an earlier row (the default is the

previous row); this is an example of a non-aggregate window function. Using this, it is

only necessary to reference the monthly sales once:

WITH monthly_sales(FirstOfMonth, Sales) AS






)

https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag

https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_lag

SELECT YEAR(FirstOfMonth) AS 'Year',

MONTHNAME(FirstOfMonth) AS 'Month',

Sales,

(Sales - LAG(Sales, 1, 0) OVER (ORDER BY FirstOfMonth)) AS Delta

FROM monthly_sales

ORDER BY FirstOfMonth;

Script: window_monthly_sales_1.sql

This almost works as expected – before discussing what the problem with the query is,

let’s look at the result:

+------+----------+----------+-----------+


+------+----------+----------+-----------+

| 2005 | May | 4823.44 | 4823.44 |

| 2005 | June | 9629.89 | 4806.45 |

| 2005 | July | 28368.91 | 18739.02 |

| 2005 | August | 24070.14 | -4298.77 |

| 2006 | February | 514.18 | -23555.96 |

+------+----------+----------+-----------+

For the months in 2005 this returns the same result as for the original query (in

cte_monthly_sales.sql), but what happens for February 2006? The LAG() function

uses the previous row – not the previous month. One solution is to add a recursive

common table expression to ensure we include all months even if there are no sales

data. This can be done as follows:

WITH RECURSIVE months(FirstOfMonth) AS

(SELECT '2005-05-01'

UNION

SELECT FirstOfMonth + INTERVAL 1 MONTH

FROM months

WHERE FirstOfMonth < '2006-02-01'

),

monthly_sales(FirstOfMonth, Sales) AS






)

SELECT YEAR(FirstOfMonth) AS 'Year',

MONTHNAME(FirstOfMonth) AS 'Month',

IFNULL(Sales, 0) AS Sales,

(IFNULL(Sales, 0) - LAG(IFNULL(Sales, 0), 1, 0)

OVER (ORDER BY FirstOfMonth)) AS Delta

FROM months

LEFT OUTER JOIN monthly_sales USING (FirstOfMonth)

ORDER BY FirstOfMonth;

Script: window_monthly_sales_2.sql

Now the query will include all months between May 2005 and February 2006 (both

included):

+------+-----------+----------+-----------+


+------+-----------+----------+-----------+

| 2005 | May | 4823.44 | 4823.44 |

| 2005 | June | 9629.89 | 4806.45 |

| 2005 | July | 28368.91 | 18739.02 |

| 2005 | August | 24070.14 | -4298.77 |

| 2005 | September | 0.00 | -24070.14 |

| 2005 | October | 0.00 | 0.00 |

| 2005 | November | 0.00 | 0.00 |

| 2005 | December | 0.00 | 0.00 |

| 2006 | January | 0.00 | 0.00 |

| 2006 | February | 514.18 | 514.18 |

+------+-----------+----------+-----------+

Which of the monthly sales examples gives the right result depends on the question you

want to answer and your business requirements.

Best Customer per Store

A common query pattern is to find the data associated with the largest or smallest value

in a set. For example: “what is the largest city in each country”, “which sales

representative sold most for each region”, etc.

Consider the sakila database, it could be interesting to know which customer spends

the most money for each store. Let’s first see how to do this without using common

table expressions or window functions.

Non-CTE

There are a few ways to find that without using common table expressions, one is:

SELECT c.store_id, c.customer_id, c.first_name, c.last_name, c.Amount

FROM (SELECT c1.store_id, c1.customer_id,

c1.first_name, c1.last_name,

SUM(p1.amount) AS Amount

FROM sakila.customer c1

INNER JOIN sakila.payment p1 USING (customer_id)

GROUP BY c1.store_id, c1.customer_id

) c

INNER JOIN (

SELECT store_id, MAX(Amount) AS MaxStoreAmount

FROM (SELECT store_id, customer_id, SUM(p2.amount) AS Amount

FROM sakila.customer c2

INNER JOIN sakila.payment p2 USING (customer_id)

GROUP BY store_id, customer_id

) tmp

GROUP BY store_id

) s ON s.store_id = c.store_id AND s.MaxStoreAmount = c.Amount;

Script: best_customer_non-cte.sql

This is not exactly easy to follow. The first subquery (c) finds how much money each

customer has spent and which store the customer is associated with. The second

subquery has two parts – and outer and inner part. The inner part again finds the

amount spent for each customer; the outer then finds the maximum amount for each

store.

The result is:

+----------+-------------+------------+-----------+--------+

| store_id | customer_id | first_name | last_name | Amount |

+----------+-------------+------------+-----------+--------+

| 1 | 148 | ELEANOR | HUNT | 216.54 |

| 2 | 526 | KARL | SEAL | 221.55 |

+----------+-------------+------------+-----------+--------+

Changing the subqueries to common table expressions is the first way to improve the

query.

Using CTE

When the same query is re-written to use a common table expression, there is no need

to duplicate the subquery finding the amount of money spent per customer:

WITH customer_amount (store_id, customer_id,

first_name, last_name, Amount) AS

(SELECT c.store_id, c.customer_id,

c.first_name, c.last_name, SUM(p.amount)

FROM sakila.customer c

INNER JOIN sakila.payment p USING (customer_id)


),

store_amount (store_id, MaxStoreAmount) AS

(SELECT store_id, MAX(Amount)

FROM customer_amount

GROUP BY store_id

)

SELECT ca.store_id, ca.customer_id,

ca.first_name, ca.last_name, ca.Amount

FROM customer_amount ca

INNER JOIN store_amount sa ON sa.store_id = ca.store_id

AND sa.MaxStoreAmount = ca.Amount;

Script: best_customer_cte.sql

However, this can be improved even further.

Using CTE and Window Function

When adding the use of window functions, you can use the RANK() function partitioning

by the store and ordering by the sales amount descending. Doing that, the customer

spending the most money per store can simply be found looking for the rows where

RANK() returns 1:

WITH customer_amount (store_id, customer_id,

first_name, last_name, Amount, `Rank`) AS

(SELECT c.store_id, c.customer_id,

c.first_name, c.last_name, SUM(p.amount),

RANK() OVER(

PARTITION BY c.store_id

ORDER BY SUM(p.amount) DESC

) AS 'Rank'

FROM sakila.customer c

INNER JOIN sakila.payment p USING (customer_id)


)

SELECT store_id, customer_id,

first_name, last_name, Amount

FROM customer_amount

WHERE `Rank` = 1;

Script: best_customer_cte_window.sql

Now there is just one subquery and it is trivial to change the query for example to return

the top three customers per store.

Fun

If you would like to have some fun with common table expressions, then the following

query generates the Mandelbrot set as ASCII-art:

WITH RECURSIVE x(i) AS (

SELECT CAST(0 AS DECIMAL(13, 10))

UNION ALL

SELECT i + 1

FROM x

WHERE i < 101

),

Z(Ix, Iy, Cx, Cy, X, Y, I) AS (

SELECT Ix, Iy, X, Y, X, Y, 0

FROM (SELECT CAST(-2.2 + 0.031 * i AS DECIMAL(13, 10)) AS X,

i AS Ix FROM x) AS xgen

CROSS JOIN (

SELECT CAST(-1.5 + 0.031 * i AS DECIMAL(13, 10)) AS Y,

i AS iY FROM x

) AS ygen

UNION ALL

SELECT Ix, Iy, Cx, Cy,

CAST(X * X - Y * Y + Cx AS DECIMAL(13, 10)) AS X,

CAST(Y * X * 2 + Cy AS DECIMAL(13, 10)), I + 1

FROM Z

WHERE X * X + Y * Y < 16.0

AND I < 27

),

Zt (Ix, Iy, I) AS (

SELECT Ix, Iy, MAX(I) AS I

FROM Z

GROUP BY Iy, Ix

ORDER BY Iy, Ix

)

SELECT GROUP_CONCAT(

SUBSTRING(

' .,,,-----++++%%%%@@@@#### ',

GREATEST(I, 1),

1

) ORDER BY Ix SEPARATOR ''

) AS 'Mandelbrot Set'

FROM Zt

GROUP BY Iy

ORDER BY Iy;

Script: mandelbrot_set.sql – Based on: https://wiki.postgresql.org/wiki/Mandelbrot_set

Not at all useful, but it does show the power of common table expressions. This is best

executed in MySQL Shell or the mysql command-line client.

This concludes the guided part of the lab. You are encouraged to continue to playing

with common table expressions and window functions.

References

If you want to learn more about the MySQL Document Store, the following references

are useful:

WITH Syntax (Common Table Expressions):

https://dev.mysql.com/doc/refman/en/with.html

Window Functions: https://dev.mysql.com/doc/refman/en/window-

functions.html

Aggregate (GROUP BY) Function Descriptions:

https://dev.mysql.com/doc/refman/en/group-by-functions.html

MySQL Workbench: https://dev.mysql.com/doc/workbench/en/

Blogs:

o MySQL 8.0 Labs: [Recursive] Common Table Expressions in MySQL

(CTEs): http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-

table-expressions-in-mysql-ctes/


(CTEs), Part Two – how to generate series:

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-

expressions-in-mysql-ctes-part-two-how-to-generate-series/


(CTEs), Part Three – hierarchies: http://mysqlserverteam.com/mysql-8-0-

labs-recursive-common-table-expressions-in-mysql-ctes-part-three-

hierarchies/

o http://mysqlserverteam.com/mysql-8-0-1-recursive-common-table-

expressions-in-mysql-ctes-part-four-depth-first-or-breadth-first-traversal-

transitive-closure-cycle-avoidance/ http://mysqlserverteam.com/mysql-

8-0-1-recursive-common-table-expressions-in-mysql-ctes-part-four-

depth-first-or-breadth-first-traversal-transitive-closure-cycle-avoidance/

https://dev.mysql.com/doc/workbench/en/

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes/

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes-part-two-how-to-generate-series/

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes-part-two-how-to-generate-series/

http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-table-expressions-in-mysql-ctes-part-three-hierarchies/



http://mysqlserverteam.com/mysql-8-0-1-recursive-common-table-expressions-in-mysql-ctes-part-four-depth-first-or-breadth-first-traversal-transitive-closure-cycle-avoidance/



Documents

Using MySQL Common Table Expressions and Window Functions ...€¦ · Common table expressions can be divided into two categories: non-recursive and recursive. Non-recursive common