A Practical Introduction to the MySQL Document Store [HOL1703] · Practical Introduction to the MySQL Document Store [HOL1703] at Oracle OpenWorld 2018, you do not need to do anything,

A Practical Introduction to the MySQL Document Store [HOL1703]

Jesper Wisborg Krogh

Senior Principal Technical Support Engineer

Lig Isler-turmelle Principle Technical Support Engineer

Prerequisites

Note: This is not required when using the laptops at HOL1703 at Oracle OpenWorld

2018.

It is assumed the following software is already installed on the machine where you try

the examples in this workbook. If you are attending the hands-on labs session A

Practical Introduction to the MySQL Document Store [HOL1703] at Oracle OpenWorld

2018, you do not need to do anything, and you can skip to the next section.

Software List

MySQL Server 8.0.12 or later

MySQL Shell 8.0.12 or later

MySQL Connector/Python 8.0.12 or later

MySQL Connector/Node.JS 8.0.12 or later

MySQL Connector/Java 8.0.12 or later

Node.JS

Data List

The world database: http://downloads.mysql.com/docs/world.sql.gz

The world_x database: http://downloads.mysql.com/docs/world_x-db.tar.gz

An empty hol1703 schema

Installing the Software

The following steps can be used to install the required software on Oracle Linux 7. For

other platforms the steps will be different.

# Install the latest public yum repo

# and enable the required ol7_developer repos:

cd /etc/yum.repos.d/

mv public-yum-ol7.repo public-yum-ol7.repo.bak

wget http://yum.oracle.com/public-yum-ol7.repo

yum-config-manager --enable ol7_developer_nodejs8

yum-config-manager --enable ol7_developer_EPEL

# Install MySQL Repo

wget https://dev.mysql.com/get/mysql80-community-release-el7-

1.noarch.rpm

yum install mysql80-community-release-el7-1.noarch.rpm

http://downloads.mysql.com/docs/world.sql.gz

http://downloads.mysql.com/docs/world_x-db.tar.gz

# Install Python pip

yum install python-pip python-wheel

pip install --upgrade pip

# Install MySQL Server, Shell, and connectors

yum install mysql-community-client \

mysql-community-common \

mysql-community-devel \

mysql-community-libs \

mysql-community-libs-compat \

mysql-community-server \

mysql-connector-java-8.0.* \

mysql-shell \

java-1.8.0-openjdk-devel \

nodejs-8* \

protobuf-java

# Start MySQL Server for the first time

# (initializes the data directory)

# and update the root password.

systemctl start mysqld

passwd=$(grep 'A temporary password is generated for root@localhost'

/var/log/mysqld.log | sed -re 's/^.* (.+)$/\1/')

mysql --user=root --password="${passwd}" --connect-expired-password -e

"SET PASSWORD = '<some secure password>'"

unset passwd

Note: It is recommended not to put the password on the command line. The above is

done for simplicity.

Installing Data

The following instructions assume the previously mentioned software has been

installed, that MySQL Server has been started, and that the root password has been

updated (it is set to an expired password by default that can be found in

/var/log/mysqld.log after MySQL has been started the first time).

The data needed for this hands-on lab can be installed following these steps:

wget http://downloads.mysql.com/docs/world.sql.zip

wget http://downloads.mysql.com/docs/world_x-db.zip

unzip world.sql.zip

unzip world_x-db.zip

mysql --user=root --password \

--execute "SOURCE world.sql; SOURCE world_x-db/world_x.sql;"

HOL Information

The following information is useful for the hands-on lab session at Oracle OpenWorld

2018:

Linux Username lab

Linux Password oracle

MySQL Username hol1703

MySQL Password hol@OOW18

MySQL Schemas hol1703 world world_x

Connectors and APIs Installed MySQL Connector/Python 8.0.12 mysql.connector (PEP 249 Python DB API) mysqlx (X DevAPI – for the Document Store) MySQL Connector/Node.js 8.0.12 @mysql/xdevapi

MySQL Connector/J 8.0.12 java.sql.*

Documentation This workbook as well as documentation for the X DevAPI, MySQL Connector/Python, MySQL Connector/Node.js, and MySQL Shell can be found in the /home/lab/docs directory. Firefox is set to open with this information as well as an overview of the available documentation as the home page.

This hands-on lab will primarily use Python for the example. However, feel free to use

JavaScript (Node.js) or Java instead if you prefer. HOL1706 scheduled for Tuesday

11:15am to 12:15pm in this room will be exclusively about Node.js and the MySQL

Document Store.

Tip: A login path has been configured for the hol1703 user, so it is not necessary to

specify the username and password when using the mysql command-line client and

MySQL Shell.

The MySQL Document Store

The MySQL Document Store was developed throughout the MySQL Server 5.7 lifetime.

The server-side is implemented through the X plugin (called mysqlx in the

information_schema.PLUGINS view), and was first introduced as a beta release with

MySQL Server 5.7.12. The X plugin reached general availability (GA) status with MySQL

Server 8.0.11 and is now a built-in plugin and enabled by default. That is, on the server-

side you do not need to do anything to start using the MySQL Document Store.

mysql> SELECT *

FROM information_schema.PLUGINS

WHERE PLUGIN_NAME = 'mysqlx'\G

*************************** 1. row ***************************

PLUGIN_NAME: mysqlx

PLUGIN_VERSION: 1.0

PLUGIN_STATUS: ACTIVE

PLUGIN_TYPE: DAEMON

PLUGIN_TYPE_VERSION: 80012.0

PLUGIN_LIBRARY: NULL

PLUGIN_LIBRARY_VERSION: NULL

PLUGIN_AUTHOR: Oracle Corp

PLUGIN_DESCRIPTION: X Plugin for MySQL

PLUGIN_LICENSE: GPL

LOAD_OPTION: ON

1 row in set (0.00 sec)

There are a few more components to the MySQL Document Store:

X Plugin: This is the server-side plugin that provides support for the X DevAPI.

X Protocol: The protocol used for an application to communicate with the X

Plugin.

The X DevAPI: The API used with the X Protocol.

Collectively these components are known as the MySQL Document Store.

X Plugin Port and Other Variables

Because the X Plugin uses a different protocol to the traditional MySQL protocol, it

needs to listen to a different port than the usual. The X Plugin by default uses port

33060. This can be configured using the mysqlx_port option. Similarly, if you want to

connect using a UNIX socket file, you need a separate socket file.

All of the variables for the X Plugin are prefixed mysqlx_. The complete list of variables

with their default values (using the MySQL Server RPM for Oracle Linux/RHEL 7) is:

mysql> SELECT *

FROM performance_schema.global_variables

WHERE VARIABLE_NAME LIKE 'mysqlx%';

+-----------------------------------+-----------------------------+

| VARIABLE_NAME | VARIABLE_VALUE |

+-----------------------------------+-----------------------------+

| mysqlx_bind_address | * |

| mysqlx_connect_timeout | 30 |

| mysqlx_document_id_unique_prefix | 0 |

| mysqlx_idle_worker_thread_timeout | 60 |

| mysqlx_interactive_timeout | 28800 |

| mysqlx_max_allowed_packet | 67108864 |

| mysqlx_max_connections | 100 |

| mysqlx_min_worker_threads | 2 |

| mysqlx_port | 33060 |

| mysqlx_port_open_timeout | 0 |

| mysqlx_read_timeout | 30 |

| mysqlx_socket | /var/run/mysqld/mysqlx.sock |

| mysqlx_ssl_ca | |

| mysqlx_ssl_capath | |

| mysqlx_ssl_cert | |

| mysqlx_ssl_cipher | |

| mysqlx_ssl_crl | |

| mysqlx_ssl_crlpath | |

| mysqlx_ssl_key | |

| mysqlx_wait_timeout | 28800 |

| mysqlx_write_timeout | 60 |

+-----------------------------------+-----------------------------+

21 rows in set (0.00 sec)

Several of the options have counterparts for the old MySQL protocol; for these the

variable names are the same just with mysqlx_ prefixed. For the purpose of this hands-

on lab, the default values can be used.

The X DevAPI

From an end user perspective, the most interesting part of the MySQL Document Store

is the X DevAPI. This is the API used to interact with the MySQL Document Store from

your programs and from MySQL Shell.

The X DevAPI is designed from the ground up with modern day usage in mind. It is

available for a range of languages, for example: Python (MySQL Connector/Python),

JavaScript (MySQL Connector/Node.js), PHP (mysql_xdevapi PECL extension), Java

(MySQL Connector/J), C++ (MySQL Connector/C++), DotNet (MySQL Connector/NET).

The X DevAPI is uniform across the supported programming languages while still

maintaining the characteristics of the language. An example if the method to get a

session is get_session() in Python but getSession() in Node.js.

The X DevAPI has three different parts. Which part you should use depends on how you

want to interact with MySQL:

Collections: The create-read-update-delete (CRUD) methods to work with JSON

documents, i.e. using MySQL as a document store. This is a NoSQL API.

SQL Tables: The CRUD methods to work with SQL (relational) tables. This is a

NoSQL API.

SQL: This can be used to execute arbitrary SQL statements against both

collections and SQL tables.

The easiest way to try the X DevAPI is to use MySQL Shell. MySQL Shell is a relatively

new command-line tool that not only support SQL statements but also Python and

JavaScript. This makes it possible to test code before implementing it in an actual

program, or use MySQL Shell to execute scripts that include use of Python or JavaScript

routines.

Note: Python and JavaScript in MySQL Shell do not use the connectors, so while the API

is the same, there are some differences in their use.

Lab Exercises

This lab will primarily explore the MySQL Document Store using MySQL Shell. This allows

you to explore the X DevAPI while still having easy access to use SQL statements, for

example to look at the underlying table definition. The examples in this section uses

Python, but can equally well be executed using JavaScript. If you choose to use

JavaScript, remove underscores in the method names and make the next letter upper

case. For example for the get_session() method, replace it with getSession().

Tip: Most of the Python, JavaScript, and Java examples are also available as MySQL

Connector/Python source code in the /home/lab/bin directory. When this is the case,

the file name will be visible from the caption to the example.

Start MySQL Shell

MySQL Shell is started using the mysqlsh command in the terminal. Optionally specify

the language mode you want to use. The language mode can also be set after starting

MySQL Shell. The following table shows how to specify the language mode on the

command-line or the MySQL shell prompt.

Language Mode Command-Line MySQL Shell Prompt

JavaScript --js \js

Python --py \py

SQL --sql \sql

The default language mode is JavaScript. The exercises in this lab will include examples

of changing the language mode.

Tip: When you switch language mode, you can keep using your existing connection if

you are already connected to a MySQL instance.

For now, start MySQL Shell without any arguments:

[lab@localhost ~]$ mysqlsh

MySQL Shell 8.0.12

Copyright (c) 2016, 2018, Oracle and/or its affiliates. All rights

reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type '\help' or '\?' for help; '\quit' to exit.

mysql-js>

You can now change to Python mode using the \py command:

mysql-js> \py

Switching to Python mode...

mysql-py>

You can connect to MySQL using the \connect command:

mysql-py> \connect hol1703@localhost

Creating a session to 'hol1703@localhost'

Fetching schema names for autocompletion... Press ^C to stop.

Your MySQL connection id is 12 (X protocol)

Server version: 8.0.12 MySQL Community Server - GPL

No default schema selected; type \use <schema> to set one.

mysql-py> session

<Session:hol1703@localhost>

No password is required as MySQL Shell has been set up to fetch the password from a

login path. If this has not been done, MySQL Shell will interactively ask for the password

and offer to store it for you.

The session can be accessed through the session object. This can be useful for example

to fetch a schema, controlling transactions, etc. You will see examples of this later in the

lab.

You can now set the default schema (this can also be done when creating the

connection) using the \use command:

mysql-py> \use hol1703

Default schema `hol1703` accessible through db.

Notice how MySQL Shell assigned the hol1703 schema to the db object. You can now

use the db object to access the schema specific methods.

Before continuing, let’s pause for a moment to consider the prompt.

The Prompt

The default prompt includes information about the connection, whether it uses SSL, the

default schema, and the language mode. An example of the default prompt can be seen

in the following figure:

MySQL Shell in the hands-on lab virtual machine has been set up to use the

Powerline+Awesome fonts. This gives a prompt with the same information as the

default prompt but using some additional custom characters:

It is beyond the scope of this lab session to go through the installation of the Powerline

and Awesome fonts. If you are interested, you can see an example of installing the

required fonts in https://mysql.wisborg.dk/2018/09/04/awesome-mysql-shell-prompt/.

Before continuing learning how to work with the MySQL Document Store from MySQL

Shell, let’s look at what you can do, if you need help.

Built-In Help

A great feature in MySQL Shell is also the ability to obtain help directly within MySQL

Shell. This is not limited to the standard --help command-line argument; it extends to

within MySQL Shell including for each object type.

https://mysql.wisborg.dk/2018/09/04/awesome-mysql-shell-prompt/

You can get general help, for example about the commands available (you have already

used the \py, \connect, and \use commands):

mysql-py> \?

The Shell Help is organized in categories and topics. To get help for a

specific category or topic use: \? <pattern>

The <pattern> argument should be the name of a category or a topic.

The pattern is a filter to identify topics for which help is required,

it can

use the following wildcards:

- ? matches any single charecter.

- * matches any character sequence.

The following are the main help categories:

- AdminAPI Introduces to the dba global object and the InnoDB

cluster

administration API.

- Shell Commands Provides details about the available built-in shell

commands.

- ShellAPI Contains information about the shell and util global

objects

as well as the mysql module that enables executing

SQL on

MySQL Servers.

- SQL Syntax Entry point to retrieve syntax help on SQL

statements.

- X DevAPI Details the mysqlx module as well as the capabilities

of the

X DevAPI which enable working with MySQL as a

Document Store

The available topics include:

- The dba global object and the classes available at the AdminAPI.

- The mysqlx module and the classes available at the X DevAPI.

- The mysql module and the global objects and classes available at the

ShellAPI.

- The functions and properties of the classes exposed by the APIs.

- The available shell commands.

- Any word that is part of an SQL statement.

SHELL COMMANDS

The shell commands allow executing specific operations including

updating the

shell configuration.

The following shell commands are available:

- \ Start multi-line input when in SQL mode.

- \connect (\c) Connects the shell to a MySQL server and assigns

the

global session.

- \exit Exits the MySQL Shell, same as \quit.

- \help (\?,\h) Prints help information about a specific topic.

- \history View and edit command line history.

- \js Switches to JavaScript processing mode.

- \nowarnings (\w) Don't show warnings after every statement.

- \option Allows working with the available shell options.

- \py Switches to Python processing mode.

- \quit (\q) Exits the MySQL Shell.

- \reconnect Reconnects the global session.

- \rehash Refresh the autocompletion cache.

- \source (\.) Loads and executes a script from a file.

- \sql Switches to SQL processing mode.

- \status (\s) Print information about the current global

session.

- \use (\u) Sets the active schema.

- \warnings (\W) Show warnings after every statement.

GLOBAL OBJEECTS

The following modules and objects are ready for use when the shell

starts:

- db Used to work with database schema objects.

- dba Used for InnoDB cluster administration.

- mysql Support for connecting to MySQL servers using the classic

MySQL

protocol.

- mysqlx Used to work with X Protocol sessions using the MySQL X

DevAPI.

- session Represents the currently open MySQL session.

- shell Gives access to general purpose functions and properties.

- util Global object that groups miscellaneous tools like upgrade

checker.

For additional information on these global objects use: <object>.help()

EXAMPLES

\? AdminAPI

Displays information about the AdminAPI.

\? \connect

Displays usage details for the \connect command.

\? check_instance_configuration

Displays usage details for the dba.check_instance_configuration

function.

\? sql syntax

Displays the main SQL help categories.

The examples at the end shows how you can get additional help. For example, to learn

how to use the \connect command:

mysql-py> \? connect

NAME

connect - Establishes the shell global session.

SYNTAX

shell.connect(connectionData[, password])

WHERE

connectionData: the connection data to be used to establish the

session.

password: The password to be used when establishing the session.

DESCRIPTION

This function will establish the global session with the received

connection data.

…

As the help is very extensive, the whole output will not all be included here.

You can also get help directly about an object you are working with. For example, you

have the db object with the hol1703 schema. To get help about the db object, use the

help() method of the object:

mysql-py> db.help()

NAME

Schema - Represents a Schema as retrived from a session created

using the

X Protocol.

DESCRIPTION

View Support

MySQL Views are stored queries that when executed produce a

result set.

…

FUNCTIONS

create_collection(name)

Creates in the current schema a new collection with the

specified

name and retrieves an object representing the new

collection

created.

…

As the help output shows, a schema object for example has a method called

create_collection(). You can get additional help about the create_collection()

method by passing that as an argument to the help() method:

mysql-py> db.help("create_collection")

NAME

create_collection - Creates in the current schema a new

collection with

the specified name and retrieves an object

representing the new collection created.

SYNTAX

<Schema>.create_collection(name)

WHERE

name: the name of the collection.

RETURNS

the new created collection.

DESCRIPTION

To specify a name for a collection, follow the naming conventions

in

MySQL.

Note that to get help for SQL statement, you must be connected to a MySQL instance,

and you either need to be in the SQL language mode or explicitly tell MySQL Shell that

you want help for an SQL statement. For example: \? SQL Syntax/SELECT.

Feel free to execute the help commands as you go through the exercises.

Built-in Objects

You have already encountered some of the built-in objects of MySQL Shell: The session

and db objects. There are more however built-in objects. From the output of \? you

executed a little earlier:

db: Used to work with database schema objects.

dba: Used for InnoDB cluster administration.

mysql: Support for connecting to MySQL servers using the classic MySQL

protocol.

mysqlx: Used to work with X Protocol sessions using the MySQL X DevAPI.

session: Represents the currently open MySQL session.

shell: Gives access to general purpose functions and properties.

util: Global object that groups miscellaneous tools like upgrade checker.

The two most useful for this lab are the db and session objects.

You are now ready to create a collection.

Create Collection

Collections are the containers for the documents. In the SQL world a collection would be

called a table. A collection is always located in a schema (database). You can create a

collection using the create_collection() method (createCollection() in

JavaScript) which is part of the db object. The method takes the name of the new

collection as a string. For example to create the animals collection and keep a reference

to the collection in the animals object:

mysql-py> animals = db.create_collection("animals")

mysql-py> animals

<Collection:animals>

What does a collection look like? You can check the definition of the underlying table

using a SHOW CREATE TABLE SQL query:

mysql-py> \sql

Switching to SQL mode... Commands end with ;

Fetching table and column names from `hol1703` for auto-completion...

Press ^C to stop.

mysql-sql> SHOW CREATE TABLE animals\G

*************************** 1. row ***************************

Table: animals

Create Table: CREATE TABLE `animals` (

`doc` json DEFAULT NULL,

`_id` varbinary(32) GENERATED ALWAYS AS

(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,

PRIMARY KEY (`_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql-sql> \py


mysql-py>

The collection consists of an InnoDB table with a normal column called doc and a

generated column called _id. The doc column is where the JSON document is stored.

The _id column is generated by extracting the value of the _id object at the base of the

JSON document; this is the primary key of the table and must always be present.

Tip: If you later need to get an object for the animals collection again, you can get it

using: animals = db.get_collection("animals")

What happens if you try to create a JSON document without a _id element? Let’s check.

Adding Documents

The first part of CRUD is to create data. In the MySQL Document Store that means

adding a JSON document to a collection. This is done by using the add() method of the

collection object. For example to add dogs as an animal (not adding the prompt to make

it easy to copy and paste):

dog = {

"Name": "Dog",

"Class": "Mammal",

"Species": ["German Shephard",

"Labrador Retriever",

"Poodle"],

"Claim": "Man's best friend",

}

session.start_transaction()

animals.add(dog)

session.commit()

Script file: animals.py

Tip: At the end of a multi-line statement like the assignment of the dog variable, hit

enter twice to exit the multi-line mode.

The first thing to notice here is that there is full transaction support in the Document

Store. So, by using MySQL as a document store, you are not giving up the advantages of

the transaction model.

The second thing is that JSON documents in Python are just a dictionary object ({…}) for

JSON objects and a list ([…]) for JSON arrays. JavaScript (obviously) also have native

support for JSON documents. This makes it very easy to work with JSON documents in

Python and JavaScript.

This was a very simple example. Let’s try a case where three animals are added:

cat = {

"Name": "Cat",

"Class": "Mammal",

"Species": ["Siamese",

"Persian",

"Norwegian Forest"]

}

cockatoo = {

"Name": "Cockatoo",

"Class": "Bird",

"Species": ["Sulphur-Crested",

"Galah",

"Gang-Gang"]

}

croc = {

"Name": "Crocodile",

"Class": "Reptile",

"Species": ["Saltwater",

"Nile",

"Siamese"]

}


stmt = animals.add(cat)

result = stmt.add((cockatoo, croc)).execute()

print("Number of animals added: {0}"

.format(result.get_affected_items_count()))

print("Number of warnings: {0}".format(result.get_warnings_count()))

print("Generated IDs: {0}".format(result.get_generated_ids()))

session.commit()

Script file: animals.py

The output of this example is similar to (the generated IDs will be different!):

Number of animals added: 3

Number of warnings: 0

Generated IDs: ["00005b8cae6c000000000000000a",

"00005b8cae6c000000000000000b", "00005b8cae6c000000000000000c"]

This example still just executes a single statement, but more happens. Let’s go through

the example in steps:

1. The three animals are defined.

2. A transaction is started. Even for one statement operations, this is still useful as

it allows you to verify everything worked as expected before committing the

change.

3. The cat is added as an animal. In this case, the return value is assigned to the

stmt variable (stmt for statement). This allows you to continue to work with the

statement before executing it.

4. The cockatoo and crocodile are added, and the statement is executed. This

shows two important things: It is possible to add more than one document in

one add() call by using a tuple or list; and it is possible to chain the method calls.

Steps 3. and 4. could have been combined into one line of code if that is

preferred. The second part of the chain here is the execute() method. When

coding in a connector, the execute() method must always be used. However, in

MySQL Shell there is a short cut – when you use CRUD methods and you do not

assign it to a variable (this was the case when adding the dog), there is an

implicit commit. The return value of the execute() is assigned to the result

variable.

5. The result object contains information about the execution of the statement.

The three print() calls print some of this information. This can for example be

used to determine whether any warnings occurred before committing the

transaction. For this discussion, the generated IDs are the most interesting; more

about those shortly.

6. The transaction is committed.

As you can see there is some flexibility for creating the statements, so you can choose

what works best in your workflow.

What are the generated IDs? Those are the values of the _id element which is the

primary key of the document. The animals that were added, did not have _id elements

in the documents, so MySQL generated some automatically and added to the

documents. The generated IDs consist of three parts (all hex encoded):

A prefix: The prefix is used to distinguish different MySQL instances. This can be

configured using the mysqlx_document_id_unique_prefix option. By default it

is 0 except in InnoDB Cluster.

A timestamp. This is the time when the MySQL instance was last restarted.

A counter. An unsigned integer that is incremented each time an ID is generated.

It follows the same rules as an auto-increment column using the value of

auto_increment_offset as the offset and auto_increment_increment as the

increment. If the counter overflows, the timestamp will be incremented with one

and the counter starts from auto_increment_offset again.

The reasons for this choice of auto-generated IDs are that it ensures it is possible to

have unique IDs generated across a replication topography and generate them in such a

way that they are optimized for the underlying storage.

If you generate the IDs yourself (or have a natural ID), the ID must be added using the

_id object at the top level of the JSON document and must be at most 32 bytes long.

Now that you have some data, you can start querying data in it.

Tip: If you did not load the data by now or want to reset the data, you can execute the

/home/lab/bin/animals.py script.

Finding Documents

You can find document using the find() method. It is somewhat more complex that the

add() method as there is support for filtering, ordering, grouping, etc. If you are used

the SQL SELECT statements, then you can do all of the same moderations of a query

using a find statement in the document store as you can in a SELECT statement (but

joins are currently not supported).

You get a find statement object by invoking the find() method on a collection. You can

then use the following methods to modify the query:

fields(): Specify which fields to return from the matching documents.

group_by(): Which fields to group the result by.

having(): A filter that is applied after the data has been grouped.

sort(): Which fields to sort the result by.

limit(): The maximum number of documents to return.

offset(): The offset to use together with the limit() method.

lock_exclusive(): Take an exclusive lock on the matching documents.

lock_shared(): Take a shared lock on the matching documents.

bind(): Used to assign values when parameters are used to set the filter

condition.

execute(): Execute the query – a document result object is returned.

You may think there is something missing here – how to set a filter that is evaluated

before a possible group by? That is set directly in the find() call.

Simple Example

As a simple example consider a query to find the known dog species (breeds) together

with the _id for the document:

stmt = animals.find("Name = :animal")

stmt = stmt.fields("_id", "Species")

stmt = stmt.bind("animal", "Dog")

result = stmt.execute()

doc = result.fetch_one()

print("_id .......: {0}".format(doc["_id"]))

print("Species ...: {0}".format(doc["Species"]))

Script: dog_breeds.py

This outputs a result similar to (the value of the _id will be different):

_id .......: 00005b8cae6c0000000000000017

Species ...: ["German Shephard", "Labrador Retriever", "Poodle"]

The example uses a couple of the methods to refine the query:

A filter clause is added to the find() method. The actual name of the animal is

specified using a parameter. This has two advantages: it adds escaping to the to

the value which reduces the potential for SQL injection, and it makes it easier to

reuse the statement.

The fields() method is used to tell MySQL only to return the _id and Species

fields. The fields are here added as separate arguments, but they could also have

been specified as a single argument using a tuple or a list.

The bind() method is used to specify which animal name the documents should

be filtered by.

One important thing to be aware of with documents is that the document is stored as a

binary object. This means that comparisons are done at the binary level meaning they –

unlike the default for SQL tables – are case sensitive!

In the example, the result is retrieved using the fetch_one() method of the result

object. There is also a fetch_all() method to fetch all matching documents.

Tip: Remember you can use the help() method to get information about the available

method. Try use result.help() and explore some of the methods and properties that

are available. However, do note that the affected_items_count property and get_

affected_items_count() method are not available for the result of this example.

It is worth considering how the fields are specified in a bit more detail.

Fields and JSON Paths

The field can either be a JSON path or an expression (possibly referencing one or more

fields in the JSON document). A JSON path is created as:

The document root is represented by $. Except for specifying fields for an index

(more later) it is optional whether $ is included in the path. This makes it simple

to specify top level objects as you just need to use the name as in the previous

example.

A dot (.) is used to separate elements. For example, if you have: {

"Red": {

"Pink": "FFC0CB",

"Pure_red": "FF0000",

"Maroon": "800000"

}

} Then to get the hex value for Maroon you can use $.Red.Maroon .

* is a wildcard that means “all members”. The wildcard can also be used as

[prefix]**{suffix} where the prefix is optional and the suffix is mandatory.

For arrays square brackets can be used to return an element. [N] returns the Nth

element (0-based) and [*] returns all elements (same as not specifying the

brackets at all).

There are also several JSON functions that can be used. Let’s take a look at an example

of that.

Example Using JSON Function

As a slightly more complicated example consider a query to find animals that have a

species called Siamese. For this the … function will be used. For each matching

document, return the animal name sorted alphabetically.

The code to perform this query is:

stmt = animals.find("JSON_CONTAINS($.Species, :species)")

stmt = stmt.fields("Name").sort("Name")

stmt = stmt.bind("species", '"Siamese"')


for doc in result.fetch_all():

print(doc["Name"])

Script: siamse.py

The adds a filter using the JSON_CONTAINS() function to require the Species array to

include a given species. The species value is set using the bind() method. Notice here

how double quotes have been added around Siamese; this is required as otherwise it

would not be a JSON string. The statement also chains the fields() method and the

sort() method to specify which fields (just the Name in this case) to include and what to

sort the documents by.

There are several JSON functions available for querying a document. For a complete list

and description see the MySQL reference manual:

JSON Functions - https://dev.mysql.com/doc/refman/en/json-functions.html

Includes the functions that are neither spatial GeoJSON functions nor aggregate

functions.

Spatial GeoJSON Functions - https://dev.mysql.com/doc/refman/en/spatial-

geojson-functions.html

Functions required for spatial (geographical) queries.

Aggregate (GROUP BY) Function Descriptions -

https://dev.mysql.com/doc/refman/en/group-by-functions.html

Includes the JSON_ARRAYAGG() and JSON_OBJECTAGG() functions to create a

JSON array and a JSON object from aggregating values, respectively.

https://dev.mysql.com/doc/refman/en/json-functions.html

https://dev.mysql.com/doc/refman/en/spatial-geojson-functions.html

https://dev.mysql.com/doc/refman/en/spatial-geojson-functions.html

https://dev.mysql.com/doc/refman/en/group-by-functions.html

Before continuing to modify documents, let’s take another break. In the examples this

far, the collection has only included a few small documents, so there has been no need

to consider query plans – a scan of all documents have performed well enough.

However, what if you have large collections and/or large documents? Just like in a

relational database, it is good practice to index the fields most commonly used for

filters, sorting, and grouping.

Indexes

The MySQL Document Store supports indexes on the collections. This can greatly

improve the query performance for queries that otherwise would need to check a large

number of documents to return a few of them or sort/group based on a given field.

You can add an index through the collection object. You can create an index with the

create_index() method which takes two arguments: The index name and a dictionary

defining the index. The definition of the index includes three elements:

type. The index type. Currently normal ordered (B-TREE) indexes and spatial

indexes are supported. Specify INDEX for a normal ordered index and SPATIAL

for a spatial index.

unique. Whether it is a unique index. Currently this must be set to False.

fields. A list of the fields to include in the index.

The fields list include a dictionary for each field to include in the index. Do note that the

order does matter. Each dictionary element for the field includes some or all of the

following elements:

field. The JSON path to the value to index. The $ to specify the root of the

document must be included.

type. Equivalent to the data type for columns in a relational table. For example

to say the value is an unsigned integer use INT UNSIGNED.

required. Whether the value must be present. Specify as a Boolean. Equivalent

to whether NOT NULL is specified.

collation. For string values which collation to use for comparisons. As of 8.0.12

this is ignored and the collation utf8mb4_0900_ai_ci is always used.

options. For spatial columns this is an integer (1-4) specifying what to do if the

value is of dimension higher than two. Use 1 (the default) to reject such values

and 2, 3, or 4 to accept them. See

https://dev.mysql.com/doc/refman/en/spatial-geojson-

functions.html#function_st-geomfromgeojson for details.

https://dev.mysql.com/doc/refman/en/spatial-geojson-functions.html#function_st-geomfromgeojson

https://dev.mysql.com/doc/refman/en/spatial-geojson-functions.html#function_st-geomfromgeojson

srid. The spatial reference system for spatial values. The default is 4326 (the

usual representation of the Earth). Supported values can be found in the SRS_ID

column of the information_schema.ST_SPATIAL_REFERENCE_SYSTEMS table.

This all feels quite complex, so it is worth looking at an example. In the example you will

add an index covering the Name and Class fields. The easiest way to do this is to work

backwards starting with the definition of the fields:

field_name = {

"field": "$.Name",

"type": "TEXT(20)",

"required": True,

"collation": "utf8mb4_0900_ai_ci",

}

field_class = {

"field": "$.Class",

"type": "TEXT(15)",

"required": False,

"collation": "utf8mb4_0900_ai_ci",

}

Script: create_index.py

Defining the fields separately is not required, but it can make it easier to manage when

creating the index. For text fields, it is necessary to specify the length. This is not the

maximum length allowed for the values, but how many characters are used for the

index.

You can then create the index as:

index_def = {

"fields": [field_name, field_class],

"type": "INDEX",

}

animals.create_index("Name_Class", index_def)

Script: create_index.py

So, there is a little more work to do here compared to adding an index in a relational

table. The reason for this is that the JSON document is schemaless, so it is essentially

necessary to define the schema for the fields together with the index.

It can be interesting to take a look at how the index looks from the table definition side.

For this you need to switch to SQL mode in MySQL Shell and use the SHOW CREATE

TABLE statement (reformatted to account for limited width in this document):

mysql-py \sql

Switching to SQL mode... Commands end with ;

mysql-sql> SHOW CREATE TABLE animals\G

*************************** 1. row ***************************

Table: animals

Create Table: CREATE TABLE `animals` (

`doc` json DEFAULT NULL,

`_id` varbinary(32)

GENERATED ALWAYS AS

(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED

NOT NULL,

`$ix_t20_r_4CB1E32CCBE4FE2585D3C8F059CB3A909FC536B7` text

GENERATED ALWAYS AS

(json_unquote(json_extract(`doc`,_utf8mb4'$.Name'))) VIRTUAL

NOT NULL,

`$ix_t15_4E46A273752C805EE33D5BF43813E81B9716DC4D` text

GENERATED ALWAYS AS

(json_unquote(json_extract(`doc`,_utf8mb4'$.Class'))) VIRTUAL,

PRIMARY KEY (`_id`),

KEY `Name_Class`

(`$ix_t20_r_4CB1E32CCBE4FE2585D3C8F059CB3A909FC536B7`(20),

`$ix_t15_4E46A273752C805EE33D5BF43813E81B9716DC4D`(15))

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


mysql-sql> \py


MySQL Document Store implements the index using a generated virtual column –

similar to how the primary key was implement, except the primary key was a stored

generated column. The column name is auto-generated such that MySQL can reuse the

same column for multiple indexes. When dropping indexes MySQL will not drop the

column until no more indexes require it.

Tip: Because comparisons using an index is done via the generated column – which for

text columns does have a character set and collation – the comparisons are case

insensitive.

With the index created, MySQL will automatically use it when the optimizer determines

it is useful for the query. As for SQL tables, indexes can help finding rows quicker, but

comes with the additional cost of maintaining the index and the optimizer has more

choices when evaluating the cheapest query plan.

Tip: Ensure you have switched back to Python mode (the \py command) before

proceeding.

The next step is to update a document.

Updating Documents

Updating documents is a little more complicated than updating a row in an SQL table.

The reason is that it is necessary to specify what part of the document to update and

that values are not necessarily scalar values.

There are three methods to change documents:

add_or_replace_one(): For upserting a document by specifying the document

ID. If there is already a document with the ID, then it is replaced, otherwise a

new document is added with the ID.

modify(): The most advanced method where you can modify one or more

documents. It can take most of the arguments known from find queries to

specify which documents to change, and additional there are methods to specify

which change to make.

replace_one(): Similar to add_or_replace_one() but if the document ID does

not already exist, then no change occurs.

The method that will be demonstrated here is the modify() method. As mentioned,

you need to use a sub method to specify how to make the modification of the matching

documents. The available modify methods are:

array_append(): Takes an existing array element and changes it to include the

original value plus the new value(s). For example if the original array is ["One",

"Two", "Three"] and you want to append [1, "Uno"] to the element with the

value "one", the resulting array is [["One", [1, "Uno"]], "Two", "Three"].

array_insert(): Takes an existing array and inserts into the element position

specified. For example, if the existing document is {"fruits" = ["apple",

"orange", "watermelon"]} and the array is specified as fruits[1] with the

new element being kiwi, then the resulting document is {"fruits" =

["apple", "kiwi", "orange", "watermelon"]}.

set: If the specified element exists, update it, otherwise add the element.

patch: Takes a part of a document and replaces it with a new document part.

This supports adding, replacing, or removing parts of a document. This is a very

powerful yet simple method.

unset: Removes an element from the document.

Note: It is mandatory to set a filter condition when calling modify(). If you fail to do

this, MySQL raises an ArgumentError (in MySQL Shell in Python mode) or a TypeError

(in Connector/Python) exception.

If you really want to modify all documents, specify "True" or similar as the condition;

this also makes it clear when you look at the code later on that you meant to match all

documents.

Let’s look at two examples: using set() to add/replace a given field and patch() to

add, change, and remove fields.

Modifying Documents with set()

In this example the class of the cat will be changed to “Mammalia” (from the current

“Mammal”). This requires a filter to find documents with Name set to Cat and then

specify the new value for Class:


stmt = animals.modify("Name = :name").set("Class", "Mammalia")

result = stmt.bind("name", "Cat").execute()

print("Number of documents changed: {0}".format(

result.get_affected_items_count()))

session.commit()

Script: modify_set.py

The filter is set in the same way as when find() was used to read data. The output will

show that 1 document has been updated.

Modifying Documents with patch()

The patch() method is the most powerful of the modify methods, however it is also

simple. The method takes a JSON document (which may be just part of the document

you want to modify). For each element in the provided document, the element will be

added, modified, or removed depending on whether the element exists and what the

new value is.

As an example consider the animals collection. The existing value for the document with

Name = Dog is:

mysql-py> animals.find("Name = 'Dog'")

[

{

"Claim": "Man's best friend",

"Class": "Mammal",

"Name": "Dog",

"Species": [

"German Shephard",


"Poodle"

],

"_id": "00005b95a8cc000000000000000d"

}

]

1 document in set (0.0008 sec)

The patch that will be provided is:

dog_patch = {

"Class": "Mammalia",

"Lifespan": {

"From": 10,

"To": 15,

"Record": 27

},

"Claim": None,

}

Script: modify_patch.py

Tip: You can also write the patch document as a string. In that case, you need to specify

the lack of a value as null instead of None.

You can pass this to the patch() method:


patch_stmt = animals.modify("Name = :name").patch(dog_patch)

patch_result = patch_stmt.bind("name", "Dog").execute()

print("Number of documents changed: {0}".format(

patch_result.get_affected_items_count()))

session.commit()


The new dog document is:

mysql-py> animals.find("Name = 'Dog'")

[

{

"Class": "Mammalia",

"Lifespan": {

"From": 10,

"Record": 27,

"To": 15

},

"Name": "Dog",

"Species": [

"German Shephard",


"Poodle"

],

"_id": "00005b95a8cc000000000000000d"

}

]

1 document in set (0.00312 sec)


So, the class has been updated, the lifespan added, and the claim removed.

The final CRUD operation available is delete.

Deleting Documents

There are two methods to remove documents: remove()and remove_one(). The

remove() method finds documents similar to how the find() method does whereas

remove_one() finds a single document by the ID.

As an example, delete the first document sorted alphabetically by name where the class

is “Mammalia”:


stmt = animals.remove("Class = :class").sort("Name").limit(1)

result = stmt.bind("class", "Mammalia").execute()

print("Number of documents removed: {0}".format(

result.get_affected_items_count()))

session.commit()

Script: remove.py

This removes the cat document. As for the modify() method, it is mandatory so specify

a filter condition with the remove() method; again you can set the filter to "True" if

you really want to delete all documents.

This concludes the Python part of the lab. However, what about the other languages? A

nice thing about the X DevAPI is that it is relatively easy to learn to use it in another

language. Let’s finish off by looking at that.

Comparing Connectors

The last section will implement the same query in Connector/Python,

Connector/Node.js, and Connector/J. It will query the countryinfo collection in the

world_x database and retrieve the information about the document with the document

ID set to USA.

Tip: If you have been experimenting with the world_x schema and want to reset it, you

can do it with the reset_world_x.sh script.

To make it easier to compare the examples directly, the connection arguments are

included directly in the examples. This is not recommended for real programs!

Important: Always store the connection arguments and particularly the password

outside of the application source code.

For each example, the output is the same:

_id ............: USA

Name ...........: United States

Geography:

Continent ......: North America

Region .........: North America

Surface Area ...: 9363520

MySQL Connector/Python

The MySQL Connector/Python example will be familiar after trying the examples

previously in the lab.

#!/usr/bin/env python

import mysqlx

connect_args = {

"user": "hol1703",

"password": "hol@OOW18",

"host": "localhost",

"port": 33060,

}

session = mysqlx.get_session(**connect_args)

schema = session.get_schema("world_x")

countryinfo = schema.get_collection("countryinfo")

stmt = countryinfo.find("_id = :country_code")

stmt.fields("_id", "Name", "geography")

stmt.bind("country_code", "USA")


usa = result.fetch_one()

id = usa["_id"]

name = usa["Name"]

continent = usa["geography"]["Continent"]

region = usa["geography"]["Region"]

surface_area = usa["geography"]["SurfaceArea"]

print("_id ............: {0}".format(id))

print("Name ...........: {0}".format(name))

print("\nGeography:")

print("Continent ......: {0}".format(continent))

print("Region .........: {0}".format(region))

print("Surface Area ...: {0}".format(surface_area))

session.close()

Script: countryinfo.py

MySQL Connector/Node.js

Tip: HOL1706 on Tuesday, Oct 23, 11:15 a.m. - 12:15 p.m. is dedicated to

Connector/Node.js and the Document Store.

Since Node.js uses asynchronous execution, the code is a bit different, but should still

look familiar.

#!/usr/bin/env node

'use strict';

const mysqlx = require('@mysql/xdevapi');

const mysqlArgs = {

host: 'localhost',

port: 33060,

password: 'hol@OOW18',

user: 'hol1703',

};

(async function() {

let session;

let docs = [];

function storeResult(doc) {

docs.push(doc);

}

try {

session = await mysqlx.getSession(mysqlArgs);

const schema = session.getSchema('world_x');

const countryinfo = schema.getCollection('countryinfo');

const stmt = countryinfo.find('_id = :country_code')

.fields('_id', 'Name', 'geography');

stmt.bind('country_code', 'USA');

const result = await stmt.execute(storeResult);

const usa = docs[0];

const id = usa['_id'];

const name = usa['Name'];

const continent = usa['geography']['Continent'];

const region = usa['geography']['Region'];

const surfaceArea = usa['geography']['SurfaceArea'];

console.log(`_id ............: ${id}`);

console.log(`Name ...........: ${name}`);

console.log('\nGeography:');

console.log(`Continent ......: ${continent}`);

console.log(`Region .........: ${region}`);

console.log(`Surface Area ...: ${surfaceArea}`);

} catch (err) {

console.error(err.message);

} finally {

session && await session.close();

}

})();

Script: countryinfo.js

MySQL Connector/J

The main difference when using Connector/J is that Java is a strongly typed language, so

it is necessary to explicitly declare each variable. The example also passes the

connection arguments as an URI instead of a JSON object. An alternative would be to

use java.util.Properties.

import com.mysql.cj.xdevapi.Session;

import com.mysql.cj.xdevapi.SessionFactory;

import com.mysql.cj.xdevapi.Schema;

import com.mysql.cj.xdevapi.Collection;

import com.mysql.cj.xdevapi.FindStatement;

import com.mysql.cj.xdevapi.DocResult;

import com.mysql.cj.xdevapi.DbDoc;

import com.mysql.cj.xdevapi.JsonParser;

import com.mysql.cj.xdevapi.JsonValue;

import com.mysql.cj.xdevapi.JsonNumber;

import com.mysql.cj.xdevapi.JsonString;

public class countryinfo {

public static void main(String[] args) {

Session session = new

SessionFactory().getSession("mysqlx://localhost:33060?user=hol1703&pass

word=hol@OOW18");

Schema schema = session.getSchema("world_x");

Collection countryinfo = schema.getCollection("countryinfo");

FindStatement stmt = countryinfo.find("_id = :country_code");

stmt.fields("_id AS _id, Name AS Name, geography AS

geography");

stmt.bind("country_code", "USA");

DocResult result = stmt.execute();

DbDoc usa = result.fetchOne();

String id = ((JsonString) usa.get("_id")).getString();

String name = ((JsonString) usa.get("Name")).getString();

DbDoc geo = (DbDoc) usa.get("geography");

String continent = ((JsonString)

geo.get("Continent")).getString();

String region = ((JsonString) geo.get("Region")).getString();

Integer surfaceArea = ((JsonNumber)

geo.get("SurfaceArea")).getInteger();

System.out.println("_id ............: " + id);

System.out.println("Name ...........: " + name);

System.out.println("\nGeography:");

System.out.println("Continent ......: " + continent);

System.out.println("Region .........: " + region);

System.out.println("Surface Area ...: " + surfaceArea);

session.close();

}

}

Script: countryinfo.java

You can compile and execute the program like (both commands as a single line):

shell$ cd ~/bin

shell$ javac -classpath /usr/share/java/mysql-connector-java-8.0.12.jar

countryinfo.java

shell$ java -classpath /usr/share/java/mysql-connector-java-

8.0.12.jar:/usr/share/java/protobuf.jar:. countryinfo

This concludes the guided part of the lab. You are encouraged to continue to playing

with the MySQL Document Store and try MySQL Connector/Node.js and MySQL

Connector/J in more details.

References

If you want to learn more about the MySQL Document Store, the following references

are useful:

X DevAPI User Guide: https://dev.mysql.com/doc/x-devapi-userguide/en/

MySQL Connector/Python X DevAPI Reference Documentation:

https://dev.mysql.com/doc/dev/connector-python/8.0/

MySQL Connector/Node.js with X DevAPI:

https://dev.mysql.com/doc/dev/connector-nodejs/8.0/

MySQL Shell Documentation – Python API:

https://dev.mysql.com/doc/dev/mysqlsh-api-python/8.0/

https://dev.mysql.com/doc/x-devapi-userguide/en/

https://dev.mysql.com/doc/dev/connector-python/8.0/

https://dev.mysql.com/doc/dev/connector-nodejs/8.0/

https://dev.mysql.com/doc/dev/mysqlsh-api-python/8.0/

MySQL Shell Documentation – JavaScript API:

https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/

MySQL Connector/J X DevAPI Reference:

https://dev.mysql.com/doc/dev/connector-j/8.0/

MySQL Shell 8.0 Manual: https://dev.mysql.com/doc/mysql-shell/8.0/en/

https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/

https://dev.mysql.com/doc/dev/connector-j/8.0/

https://dev.mysql.com/doc/mysql-shell/8.0/en/