Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
A Practical Introduction to the MySQL Document Store [HOL1703]
Jesper Wisborg Krogh
Senior Principal Technical Support Engineer
Lig Isler-turmelle Principle Technical Support Engineer
Prerequisites
Note: This is not required when using the laptops at HOL1703 at Oracle OpenWorld
2018.
It is assumed the following software is already installed on the machine where you try
the examples in this workbook. If you are attending the hands-on labs session A
Practical Introduction to the MySQL Document Store [HOL1703] at Oracle OpenWorld
2018, you do not need to do anything, and you can skip to the next section.
Software List
MySQL Server 8.0.12 or later
MySQL Shell 8.0.12 or later
MySQL Connector/Python 8.0.12 or later
MySQL Connector/Node.JS 8.0.12 or later
MySQL Connector/Java 8.0.12 or later
Node.JS
Data List
The world database: http://downloads.mysql.com/docs/world.sql.gz
The world_x database: http://downloads.mysql.com/docs/world_x-db.tar.gz
An empty hol1703 schema
Installing the Software
The following steps can be used to install the required software on Oracle Linux 7. For
other platforms the steps will be different.
# Install the latest public yum repo
# and enable the required ol7_developer repos:
cd /etc/yum.repos.d/
mv public-yum-ol7.repo public-yum-ol7.repo.bak
wget http://yum.oracle.com/public-yum-ol7.repo
yum-config-manager --enable ol7_developer_nodejs8
yum-config-manager --enable ol7_developer_EPEL
# Install MySQL Repo
wget https://dev.mysql.com/get/mysql80-community-release-el7-
1.noarch.rpm
yum install mysql80-community-release-el7-1.noarch.rpm
# Install Python pip
yum install python-pip python-wheel
pip install --upgrade pip
# Install MySQL Server, Shell, and connectors
yum install mysql-community-client \
mysql-community-common \
mysql-community-devel \
mysql-community-libs \
mysql-community-libs-compat \
mysql-community-server \
mysql-connector-java-8.0.* \
mysql-shell \
java-1.8.0-openjdk-devel \
nodejs-8* \
protobuf-java
# Start MySQL Server for the first time
# (initializes the data directory)
# and update the root password.
systemctl start mysqld
passwd=$(grep 'A temporary password is generated for root@localhost'
/var/log/mysqld.log | sed -re 's/^.* (.+)$/\1/')
mysql --user=root --password="${passwd}" --connect-expired-password -e
"SET PASSWORD = '<some secure password>'"
unset passwd
Note: It is recommended not to put the password on the command line. The above is
done for simplicity.
Installing Data
The following instructions assume the previously mentioned software has been
installed, that MySQL Server has been started, and that the root password has been
updated (it is set to an expired password by default that can be found in
/var/log/mysqld.log after MySQL has been started the first time).
The data needed for this hands-on lab can be installed following these steps:
wget http://downloads.mysql.com/docs/world.sql.zip
wget http://downloads.mysql.com/docs/world_x-db.zip
unzip world.sql.zip
unzip world_x-db.zip
mysql --user=root --password \
--execute "SOURCE world.sql; SOURCE world_x-db/world_x.sql;"
HOL Information
The following information is useful for the hands-on lab session at Oracle OpenWorld
2018:
Linux Username lab
Linux Password oracle
MySQL Username hol1703
MySQL Password hol@OOW18
MySQL Schemas hol1703 world world_x
Connectors and APIs Installed MySQL Connector/Python 8.0.12 mysql.connector (PEP 249 Python DB API) mysqlx (X DevAPI – for the Document Store) MySQL Connector/Node.js 8.0.12 @mysql/xdevapi
MySQL Connector/J 8.0.12 java.sql.*
Documentation This workbook as well as documentation for the X DevAPI, MySQL Connector/Python, MySQL Connector/Node.js, and MySQL Shell can be found in the /home/lab/docs directory. Firefox is set to open with this information as well as an overview of the available documentation as the home page.
This hands-on lab will primarily use Python for the example. However, feel free to use
JavaScript (Node.js) or Java instead if you prefer. HOL1706 scheduled for Tuesday
11:15am to 12:15pm in this room will be exclusively about Node.js and the MySQL
Document Store.
Tip: A login path has been configured for the hol1703 user, so it is not necessary to
specify the username and password when using the mysql command-line client and
MySQL Shell.
The MySQL Document Store
The MySQL Document Store was developed throughout the MySQL Server 5.7 lifetime.
The server-side is implemented through the X plugin (called mysqlx in the
information_schema.PLUGINS view), and was first introduced as a beta release with
MySQL Server 5.7.12. The X plugin reached general availability (GA) status with MySQL
Server 8.0.11 and is now a built-in plugin and enabled by default. That is, on the server-
side you do not need to do anything to start using the MySQL Document Store.
mysql> SELECT *
FROM information_schema.PLUGINS
WHERE PLUGIN_NAME = 'mysqlx'\G
*************************** 1. row ***************************
PLUGIN_NAME: mysqlx
PLUGIN_VERSION: 1.0
PLUGIN_STATUS: ACTIVE
PLUGIN_TYPE: DAEMON
PLUGIN_TYPE_VERSION: 80012.0
PLUGIN_LIBRARY: NULL
PLUGIN_LIBRARY_VERSION: NULL
PLUGIN_AUTHOR: Oracle Corp
PLUGIN_DESCRIPTION: X Plugin for MySQL
PLUGIN_LICENSE: GPL
LOAD_OPTION: ON
1 row in set (0.00 sec)
There are a few more components to the MySQL Document Store:
X Plugin: This is the server-side plugin that provides support for the X DevAPI.
X Protocol: The protocol used for an application to communicate with the X
Plugin.
The X DevAPI: The API used with the X Protocol.
Collectively these components are known as the MySQL Document Store.
X Plugin Port and Other Variables
Because the X Plugin uses a different protocol to the traditional MySQL protocol, it
needs to listen to a different port than the usual. The X Plugin by default uses port
33060. This can be configured using the mysqlx_port option. Similarly, if you want to
connect using a UNIX socket file, you need a separate socket file.
All of the variables for the X Plugin are prefixed mysqlx_. The complete list of variables
with their default values (using the MySQL Server RPM for Oracle Linux/RHEL 7) is:
mysql> SELECT *
FROM performance_schema.global_variables
WHERE VARIABLE_NAME LIKE 'mysqlx%';
+-----------------------------------+-----------------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
+-----------------------------------+-----------------------------+
| mysqlx_bind_address | * |
| mysqlx_connect_timeout | 30 |
| mysqlx_document_id_unique_prefix | 0 |
| mysqlx_idle_worker_thread_timeout | 60 |
| mysqlx_interactive_timeout | 28800 |
| mysqlx_max_allowed_packet | 67108864 |
| mysqlx_max_connections | 100 |
| mysqlx_min_worker_threads | 2 |
| mysqlx_port | 33060 |
| mysqlx_port_open_timeout | 0 |
| mysqlx_read_timeout | 30 |
| mysqlx_socket | /var/run/mysqld/mysqlx.sock |
| mysqlx_ssl_ca | |
| mysqlx_ssl_capath | |
| mysqlx_ssl_cert | |
| mysqlx_ssl_cipher | |
| mysqlx_ssl_crl | |
| mysqlx_ssl_crlpath | |
| mysqlx_ssl_key | |
| mysqlx_wait_timeout | 28800 |
| mysqlx_write_timeout | 60 |
+-----------------------------------+-----------------------------+
21 rows in set (0.00 sec)
Several of the options have counterparts for the old MySQL protocol; for these the
variable names are the same just with mysqlx_ prefixed. For the purpose of this hands-
on lab, the default values can be used.
The X DevAPI
From an end user perspective, the most interesting part of the MySQL Document Store
is the X DevAPI. This is the API used to interact with the MySQL Document Store from
your programs and from MySQL Shell.
The X DevAPI is designed from the ground up with modern day usage in mind. It is
available for a range of languages, for example: Python (MySQL Connector/Python),
JavaScript (MySQL Connector/Node.js), PHP (mysql_xdevapi PECL extension), Java
(MySQL Connector/J), C++ (MySQL Connector/C++), DotNet (MySQL Connector/NET).
The X DevAPI is uniform across the supported programming languages while still
maintaining the characteristics of the language. An example if the method to get a
session is get_session() in Python but getSession() in Node.js.
The X DevAPI has three different parts. Which part you should use depends on how you
want to interact with MySQL:
Collections: The create-read-update-delete (CRUD) methods to work with JSON
documents, i.e. using MySQL as a document store. This is a NoSQL API.
SQL Tables: The CRUD methods to work with SQL (relational) tables. This is a
NoSQL API.
SQL: This can be used to execute arbitrary SQL statements against both
collections and SQL tables.
The easiest way to try the X DevAPI is to use MySQL Shell. MySQL Shell is a relatively
new command-line tool that not only support SQL statements but also Python and
JavaScript. This makes it possible to test code before implementing it in an actual
program, or use MySQL Shell to execute scripts that include use of Python or JavaScript
routines.
Note: Python and JavaScript in MySQL Shell do not use the connectors, so while the API
is the same, there are some differences in their use.
Lab Exercises
This lab will primarily explore the MySQL Document Store using MySQL Shell. This allows
you to explore the X DevAPI while still having easy access to use SQL statements, for
example to look at the underlying table definition. The examples in this section uses
Python, but can equally well be executed using JavaScript. If you choose to use
JavaScript, remove underscores in the method names and make the next letter upper
case. For example for the get_session() method, replace it with getSession().
Tip: Most of the Python, JavaScript, and Java examples are also available as MySQL
Connector/Python source code in the /home/lab/bin directory. When this is the case,
the file name will be visible from the caption to the example.
Start MySQL Shell
MySQL Shell is started using the mysqlsh command in the terminal. Optionally specify
the language mode you want to use. The language mode can also be set after starting
MySQL Shell. The following table shows how to specify the language mode on the
command-line or the MySQL shell prompt.
Language Mode Command-Line MySQL Shell Prompt
JavaScript --js \js
Python --py \py
SQL --sql \sql
The default language mode is JavaScript. The exercises in this lab will include examples
of changing the language mode.
Tip: When you switch language mode, you can keep using your existing connection if
you are already connected to a MySQL instance.
For now, start MySQL Shell without any arguments:
[lab@localhost ~]$ mysqlsh
MySQL Shell 8.0.12
Copyright (c) 2016, 2018, Oracle and/or its affiliates. All rights
reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type '\help' or '\?' for help; '\quit' to exit.
mysql-js>
You can now change to Python mode using the \py command:
mysql-js> \py
Switching to Python mode...
mysql-py>
You can connect to MySQL using the \connect command:
mysql-py> \connect hol1703@localhost
Creating a session to 'hol1703@localhost'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 12 (X protocol)
Server version: 8.0.12 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
mysql-py> session
<Session:hol1703@localhost>
No password is required as MySQL Shell has been set up to fetch the password from a
login path. If this has not been done, MySQL Shell will interactively ask for the password
and offer to store it for you.
The session can be accessed through the session object. This can be useful for example
to fetch a schema, controlling transactions, etc. You will see examples of this later in the
lab.
You can now set the default schema (this can also be done when creating the
connection) using the \use command:
mysql-py> \use hol1703
Default schema `hol1703` accessible through db.
Notice how MySQL Shell assigned the hol1703 schema to the db object. You can now
use the db object to access the schema specific methods.
Before continuing, let’s pause for a moment to consider the prompt.
The Prompt
The default prompt includes information about the connection, whether it uses SSL, the
default schema, and the language mode. An example of the default prompt can be seen
in the following figure:
MySQL Shell in the hands-on lab virtual machine has been set up to use the
Powerline+Awesome fonts. This gives a prompt with the same information as the
default prompt but using some additional custom characters:
It is beyond the scope of this lab session to go through the installation of the Powerline
and Awesome fonts. If you are interested, you can see an example of installing the
required fonts in https://mysql.wisborg.dk/2018/09/04/awesome-mysql-shell-prompt/.
Before continuing learning how to work with the MySQL Document Store from MySQL
Shell, let’s look at what you can do, if you need help.
Built-In Help
A great feature in MySQL Shell is also the ability to obtain help directly within MySQL
Shell. This is not limited to the standard --help command-line argument; it extends to
within MySQL Shell including for each object type.
You can get general help, for example about the commands available (you have already
used the \py, \connect, and \use commands):
mysql-py> \?
The Shell Help is organized in categories and topics. To get help for a
specific category or topic use: \? <pattern>
The <pattern> argument should be the name of a category or a topic.
The pattern is a filter to identify topics for which help is required,
it can
use the following wildcards:
- ? matches any single charecter.
- * matches any character sequence.
The following are the main help categories:
- AdminAPI Introduces to the dba global object and the InnoDB
cluster
administration API.
- Shell Commands Provides details about the available built-in shell
commands.
- ShellAPI Contains information about the shell and util global
objects
as well as the mysql module that enables executing
SQL on
MySQL Servers.
- SQL Syntax Entry point to retrieve syntax help on SQL
statements.
- X DevAPI Details the mysqlx module as well as the capabilities
of the
X DevAPI which enable working with MySQL as a
Document Store
The available topics include:
- The dba global object and the classes available at the AdminAPI.
- The mysqlx module and the classes available at the X DevAPI.
- The mysql module and the global objects and classes available at the
ShellAPI.
- The functions and properties of the classes exposed by the APIs.
- The available shell commands.
- Any word that is part of an SQL statement.
SHELL COMMANDS
The shell commands allow executing specific operations including
updating the
shell configuration.
The following shell commands are available:
- \ Start multi-line input when in SQL mode.
- \connect (\c) Connects the shell to a MySQL server and assigns
the
global session.
- \exit Exits the MySQL Shell, same as \quit.
- \help (\?,\h) Prints help information about a specific topic.
- \history View and edit command line history.
- \js Switches to JavaScript processing mode.
- \nowarnings (\w) Don't show warnings after every statement.
- \option Allows working with the available shell options.
- \py Switches to Python processing mode.
- \quit (\q) Exits the MySQL Shell.
- \reconnect Reconnects the global session.
- \rehash Refresh the autocompletion cache.
- \source (\.) Loads and executes a script from a file.
- \sql Switches to SQL processing mode.
- \status (\s) Print information about the current global
session.
- \use (\u) Sets the active schema.
- \warnings (\W) Show warnings after every statement.
GLOBAL OBJEECTS
The following modules and objects are ready for use when the shell
starts:
- db Used to work with database schema objects.
- dba Used for InnoDB cluster administration.
- mysql Support for connecting to MySQL servers using the classic
MySQL
protocol.
- mysqlx Used to work with X Protocol sessions using the MySQL X
DevAPI.
- session Represents the currently open MySQL session.
- shell Gives access to general purpose functions and properties.
- util Global object that groups miscellaneous tools like upgrade
checker.
For additional information on these global objects use: <object>.help()
EXAMPLES
\? AdminAPI
Displays information about the AdminAPI.
\? \connect
Displays usage details for the \connect command.
\? check_instance_configuration
Displays usage details for the dba.check_instance_configuration
function.
\? sql syntax
Displays the main SQL help categories.
The examples at the end shows how you can get additional help. For example, to learn
how to use the \connect command:
mysql-py> \? connect
NAME
connect - Establishes the shell global session.
SYNTAX
shell.connect(connectionData[, password])
WHERE
connectionData: the connection data to be used to establish the
session.
password: The password to be used when establishing the session.
DESCRIPTION
This function will establish the global session with the received
connection data.
…
As the help is very extensive, the whole output will not all be included here.
You can also get help directly about an object you are working with. For example, you
have the db object with the hol1703 schema. To get help about the db object, use the
help() method of the object:
mysql-py> db.help()
NAME
Schema - Represents a Schema as retrived from a session created
using the
X Protocol.
DESCRIPTION
View Support
MySQL Views are stored queries that when executed produce a
result set.
…
FUNCTIONS
create_collection(name)
Creates in the current schema a new collection with the
specified
name and retrieves an object representing the new
collection
created.
…
As the help output shows, a schema object for example has a method called
create_collection(). You can get additional help about the create_collection()
method by passing that as an argument to the help() method:
mysql-py> db.help("create_collection")
NAME
create_collection - Creates in the current schema a new
collection with
the specified name and retrieves an object
representing the new collection created.
SYNTAX
<Schema>.create_collection(name)
WHERE
name: the name of the collection.
RETURNS
the new created collection.
DESCRIPTION
To specify a name for a collection, follow the naming conventions
in
MySQL.
Note that to get help for SQL statement, you must be connected to a MySQL instance,
and you either need to be in the SQL language mode or explicitly tell MySQL Shell that
you want help for an SQL statement. For example: \? SQL Syntax/SELECT.
Feel free to execute the help commands as you go through the exercises.
Built-in Objects
You have already encountered some of the built-in objects of MySQL Shell: The session
and db objects. There are more however built-in objects. From the output of \? you
executed a little earlier:
db: Used to work with database schema objects.
dba: Used for InnoDB cluster administration.
mysql: Support for connecting to MySQL servers using the classic MySQL
protocol.
mysqlx: Used to work with X Protocol sessions using the MySQL X DevAPI.
session: Represents the currently open MySQL session.
shell: Gives access to general purpose functions and properties.
util: Global object that groups miscellaneous tools like upgrade checker.
The two most useful for this lab are the db and session objects.
You are now ready to create a collection.
Create Collection
Collections are the containers for the documents. In the SQL world a collection would be
called a table. A collection is always located in a schema (database). You can create a
collection using the create_collection() method (createCollection() in
JavaScript) which is part of the db object. The method takes the name of the new
collection as a string. For example to create the animals collection and keep a reference
to the collection in the animals object:
mysql-py> animals = db.create_collection("animals")
mysql-py> animals
<Collection:animals>
What does a collection look like? You can check the definition of the underlying table
using a SHOW CREATE TABLE SQL query:
mysql-py> \sql
Switching to SQL mode... Commands end with ;
Fetching table and column names from `hol1703` for auto-completion...
Press ^C to stop.
mysql-sql> SHOW CREATE TABLE animals\G
*************************** 1. row ***************************
Table: animals
Create Table: CREATE TABLE `animals` (
`doc` json DEFAULT NULL,
`_id` varbinary(32) GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED NOT NULL,
PRIMARY KEY (`_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.0007 sec)
mysql-sql> \py
Switching to Python mode...
mysql-py>
The collection consists of an InnoDB table with a normal column called doc and a
generated column called _id. The doc column is where the JSON document is stored.
The _id column is generated by extracting the value of the _id object at the base of the
JSON document; this is the primary key of the table and must always be present.
Tip: If you later need to get an object for the animals collection again, you can get it
using: animals = db.get_collection("animals")
What happens if you try to create a JSON document without a _id element? Let’s check.
Adding Documents
The first part of CRUD is to create data. In the MySQL Document Store that means
adding a JSON document to a collection. This is done by using the add() method of the
collection object. For example to add dogs as an animal (not adding the prompt to make
it easy to copy and paste):
dog = {
"Name": "Dog",
"Class": "Mammal",
"Species": ["German Shephard",
"Labrador Retriever",
"Poodle"],
"Claim": "Man's best friend",
}
session.start_transaction()
animals.add(dog)
session.commit()
Script file: animals.py
Tip: At the end of a multi-line statement like the assignment of the dog variable, hit
enter twice to exit the multi-line mode.
The first thing to notice here is that there is full transaction support in the Document
Store. So, by using MySQL as a document store, you are not giving up the advantages of
the transaction model.
The second thing is that JSON documents in Python are just a dictionary object ({…}) for
JSON objects and a list ([…]) for JSON arrays. JavaScript (obviously) also have native
support for JSON documents. This makes it very easy to work with JSON documents in
Python and JavaScript.
This was a very simple example. Let’s try a case where three animals are added:
cat = {
"Name": "Cat",
"Class": "Mammal",
"Species": ["Siamese",
"Persian",
"Norwegian Forest"]
}
cockatoo = {
"Name": "Cockatoo",
"Class": "Bird",
"Species": ["Sulphur-Crested",
"Galah",
"Gang-Gang"]
}
croc = {
"Name": "Crocodile",
"Class": "Reptile",
"Species": ["Saltwater",
"Nile",
"Siamese"]
}
session.start_transaction()
stmt = animals.add(cat)
result = stmt.add((cockatoo, croc)).execute()
print("Number of animals added: {0}"
.format(result.get_affected_items_count()))
print("Number of warnings: {0}".format(result.get_warnings_count()))
print("Generated IDs: {0}".format(result.get_generated_ids()))
session.commit()
Script file: animals.py
The output of this example is similar to (the generated IDs will be different!):
Number of animals added: 3
Number of warnings: 0
Generated IDs: ["00005b8cae6c000000000000000a",
"00005b8cae6c000000000000000b", "00005b8cae6c000000000000000c"]
This example still just executes a single statement, but more happens. Let’s go through
the example in steps:
1. The three animals are defined.
2. A transaction is started. Even for one statement operations, this is still useful as
it allows you to verify everything worked as expected before committing the
change.
3. The cat is added as an animal. In this case, the return value is assigned to the
stmt variable (stmt for statement). This allows you to continue to work with the
statement before executing it.
4. The cockatoo and crocodile are added, and the statement is executed. This
shows two important things: It is possible to add more than one document in
one add() call by using a tuple or list; and it is possible to chain the method calls.
Steps 3. and 4. could have been combined into one line of code if that is
preferred. The second part of the chain here is the execute() method. When
coding in a connector, the execute() method must always be used. However, in
MySQL Shell there is a short cut – when you use CRUD methods and you do not
assign it to a variable (this was the case when adding the dog), there is an
implicit commit. The return value of the execute() is assigned to the result
variable.
5. The result object contains information about the execution of the statement.
The three print() calls print some of this information. This can for example be
used to determine whether any warnings occurred before committing the
transaction. For this discussion, the generated IDs are the most interesting; more
about those shortly.
6. The transaction is committed.
As you can see there is some flexibility for creating the statements, so you can choose
what works best in your workflow.
What are the generated IDs? Those are the values of the _id element which is the
primary key of the document. The animals that were added, did not have _id elements
in the documents, so MySQL generated some automatically and added to the
documents. The generated IDs consist of three parts (all hex encoded):
A prefix: The prefix is used to distinguish different MySQL instances. This can be
configured using the mysqlx_document_id_unique_prefix option. By default it
is 0 except in InnoDB Cluster.
A timestamp. This is the time when the MySQL instance was last restarted.
A counter. An unsigned integer that is incremented each time an ID is generated.
It follows the same rules as an auto-increment column using the value of
auto_increment_offset as the offset and auto_increment_increment as the
increment. If the counter overflows, the timestamp will be incremented with one
and the counter starts from auto_increment_offset again.
The reasons for this choice of auto-generated IDs are that it ensures it is possible to
have unique IDs generated across a replication topography and generate them in such a
way that they are optimized for the underlying storage.
If you generate the IDs yourself (or have a natural ID), the ID must be added using the
_id object at the top level of the JSON document and must be at most 32 bytes long.
Now that you have some data, you can start querying data in it.
Tip: If you did not load the data by now or want to reset the data, you can execute the
/home/lab/bin/animals.py script.
Finding Documents
You can find document using the find() method. It is somewhat more complex that the
add() method as there is support for filtering, ordering, grouping, etc. If you are used
the SQL SELECT statements, then you can do all of the same moderations of a query
using a find statement in the document store as you can in a SELECT statement (but
joins are currently not supported).
You get a find statement object by invoking the find() method on a collection. You can
then use the following methods to modify the query:
fields(): Specify which fields to return from the matching documents.
group_by(): Which fields to group the result by.
having(): A filter that is applied after the data has been grouped.
sort(): Which fields to sort the result by.
limit(): The maximum number of documents to return.
offset(): The offset to use together with the limit() method.
lock_exclusive(): Take an exclusive lock on the matching documents.
lock_shared(): Take a shared lock on the matching documents.
bind(): Used to assign values when parameters are used to set the filter
condition.
execute(): Execute the query – a document result object is returned.
You may think there is something missing here – how to set a filter that is evaluated
before a possible group by? That is set directly in the find() call.
Simple Example
As a simple example consider a query to find the known dog species (breeds) together
with the _id for the document:
stmt = animals.find("Name = :animal")
stmt = stmt.fields("_id", "Species")
stmt = stmt.bind("animal", "Dog")
result = stmt.execute()
doc = result.fetch_one()
print("_id .......: {0}".format(doc["_id"]))
print("Species ...: {0}".format(doc["Species"]))
Script: dog_breeds.py
This outputs a result similar to (the value of the _id will be different):
_id .......: 00005b8cae6c0000000000000017
Species ...: ["German Shephard", "Labrador Retriever", "Poodle"]
The example uses a couple of the methods to refine the query:
A filter clause is added to the find() method. The actual name of the animal is
specified using a parameter. This has two advantages: it adds escaping to the to
the value which reduces the potential for SQL injection, and it makes it easier to
reuse the statement.
The fields() method is used to tell MySQL only to return the _id and Species
fields. The fields are here added as separate arguments, but they could also have
been specified as a single argument using a tuple or a list.
The bind() method is used to specify which animal name the documents should
be filtered by.
One important thing to be aware of with documents is that the document is stored as a
binary object. This means that comparisons are done at the binary level meaning they –
unlike the default for SQL tables – are case sensitive!
In the example, the result is retrieved using the fetch_one() method of the result
object. There is also a fetch_all() method to fetch all matching documents.
Tip: Remember you can use the help() method to get information about the available
method. Try use result.help() and explore some of the methods and properties that
are available. However, do note that the affected_items_count property and get_
affected_items_count() method are not available for the result of this example.
It is worth considering how the fields are specified in a bit more detail.
Fields and JSON Paths
The field can either be a JSON path or an expression (possibly referencing one or more
fields in the JSON document). A JSON path is created as:
The document root is represented by $. Except for specifying fields for an index
(more later) it is optional whether $ is included in the path. This makes it simple
to specify top level objects as you just need to use the name as in the previous
example.
A dot (.) is used to separate elements. For example, if you have: {
"Red": {
"Pink": "FFC0CB",
"Pure_red": "FF0000",
"Maroon": "800000"
}
} Then to get the hex value for Maroon you can use $.Red.Maroon .
* is a wildcard that means “all members”. The wildcard can also be used as
[prefix]**{suffix} where the prefix is optional and the suffix is mandatory.
For arrays square brackets can be used to return an element. [N] returns the Nth
element (0-based) and [*] returns all elements (same as not specifying the
brackets at all).
There are also several JSON functions that can be used. Let’s take a look at an example
of that.
Example Using JSON Function
As a slightly more complicated example consider a query to find animals that have a
species called Siamese. For this the … function will be used. For each matching
document, return the animal name sorted alphabetically.
The code to perform this query is:
stmt = animals.find("JSON_CONTAINS($.Species, :species)")
stmt = stmt.fields("Name").sort("Name")
stmt = stmt.bind("species", '"Siamese"')
result = stmt.execute()
for doc in result.fetch_all():
print(doc["Name"])
Script: siamse.py
The adds a filter using the JSON_CONTAINS() function to require the Species array to
include a given species. The species value is set using the bind() method. Notice here
how double quotes have been added around Siamese; this is required as otherwise it
would not be a JSON string. The statement also chains the fields() method and the
sort() method to specify which fields (just the Name in this case) to include and what to
sort the documents by.
There are several JSON functions available for querying a document. For a complete list
and description see the MySQL reference manual:
JSON Functions - https://dev.mysql.com/doc/refman/en/json-functions.html
Includes the functions that are neither spatial GeoJSON functions nor aggregate
functions.
Spatial GeoJSON Functions - https://dev.mysql.com/doc/refman/en/spatial-
geojson-functions.html
Functions required for spatial (geographical) queries.
Aggregate (GROUP BY) Function Descriptions -
https://dev.mysql.com/doc/refman/en/group-by-functions.html
Includes the JSON_ARRAYAGG() and JSON_OBJECTAGG() functions to create a
JSON array and a JSON object from aggregating values, respectively.
Before continuing to modify documents, let’s take another break. In the examples this
far, the collection has only included a few small documents, so there has been no need
to consider query plans – a scan of all documents have performed well enough.
However, what if you have large collections and/or large documents? Just like in a
relational database, it is good practice to index the fields most commonly used for
filters, sorting, and grouping.
Indexes
The MySQL Document Store supports indexes on the collections. This can greatly
improve the query performance for queries that otherwise would need to check a large
number of documents to return a few of them or sort/group based on a given field.
You can add an index through the collection object. You can create an index with the
create_index() method which takes two arguments: The index name and a dictionary
defining the index. The definition of the index includes three elements:
type. The index type. Currently normal ordered (B-TREE) indexes and spatial
indexes are supported. Specify INDEX for a normal ordered index and SPATIAL
for a spatial index.
unique. Whether it is a unique index. Currently this must be set to False.
fields. A list of the fields to include in the index.
The fields list include a dictionary for each field to include in the index. Do note that the
order does matter. Each dictionary element for the field includes some or all of the
following elements:
field. The JSON path to the value to index. The $ to specify the root of the
document must be included.
type. Equivalent to the data type for columns in a relational table. For example
to say the value is an unsigned integer use INT UNSIGNED.
required. Whether the value must be present. Specify as a Boolean. Equivalent
to whether NOT NULL is specified.
collation. For string values which collation to use for comparisons. As of 8.0.12
this is ignored and the collation utf8mb4_0900_ai_ci is always used.
options. For spatial columns this is an integer (1-4) specifying what to do if the
value is of dimension higher than two. Use 1 (the default) to reject such values
and 2, 3, or 4 to accept them. See
https://dev.mysql.com/doc/refman/en/spatial-geojson-
functions.html#function_st-geomfromgeojson for details.
srid. The spatial reference system for spatial values. The default is 4326 (the
usual representation of the Earth). Supported values can be found in the SRS_ID
column of the information_schema.ST_SPATIAL_REFERENCE_SYSTEMS table.
This all feels quite complex, so it is worth looking at an example. In the example you will
add an index covering the Name and Class fields. The easiest way to do this is to work
backwards starting with the definition of the fields:
field_name = {
"field": "$.Name",
"type": "TEXT(20)",
"required": True,
"collation": "utf8mb4_0900_ai_ci",
}
field_class = {
"field": "$.Class",
"type": "TEXT(15)",
"required": False,
"collation": "utf8mb4_0900_ai_ci",
}
Script: create_index.py
Defining the fields separately is not required, but it can make it easier to manage when
creating the index. For text fields, it is necessary to specify the length. This is not the
maximum length allowed for the values, but how many characters are used for the
index.
You can then create the index as:
index_def = {
"fields": [field_name, field_class],
"type": "INDEX",
}
animals.create_index("Name_Class", index_def)
Script: create_index.py
So, there is a little more work to do here compared to adding an index in a relational
table. The reason for this is that the JSON document is schemaless, so it is essentially
necessary to define the schema for the fields together with the index.
It can be interesting to take a look at how the index looks from the table definition side.
For this you need to switch to SQL mode in MySQL Shell and use the SHOW CREATE
TABLE statement (reformatted to account for limited width in this document):
mysql-py \sql
Switching to SQL mode... Commands end with ;
mysql-sql> SHOW CREATE TABLE animals\G
*************************** 1. row ***************************
Table: animals
Create Table: CREATE TABLE `animals` (
`doc` json DEFAULT NULL,
`_id` varbinary(32)
GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$._id'))) STORED
NOT NULL,
`$ix_t20_r_4CB1E32CCBE4FE2585D3C8F059CB3A909FC536B7` text
GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$.Name'))) VIRTUAL
NOT NULL,
`$ix_t15_4E46A273752C805EE33D5BF43813E81B9716DC4D` text
GENERATED ALWAYS AS
(json_unquote(json_extract(`doc`,_utf8mb4'$.Class'))) VIRTUAL,
PRIMARY KEY (`_id`),
KEY `Name_Class`
(`$ix_t20_r_4CB1E32CCBE4FE2585D3C8F059CB3A909FC536B7`(20),
`$ix_t15_4E46A273752C805EE33D5BF43813E81B9716DC4D`(15))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.0006 sec)
mysql-sql> \py
Switching to Python mode...
MySQL Document Store implements the index using a generated virtual column –
similar to how the primary key was implement, except the primary key was a stored
generated column. The column name is auto-generated such that MySQL can reuse the
same column for multiple indexes. When dropping indexes MySQL will not drop the
column until no more indexes require it.
Tip: Because comparisons using an index is done via the generated column – which for
text columns does have a character set and collation – the comparisons are case
insensitive.
With the index created, MySQL will automatically use it when the optimizer determines
it is useful for the query. As for SQL tables, indexes can help finding rows quicker, but
comes with the additional cost of maintaining the index and the optimizer has more
choices when evaluating the cheapest query plan.
Tip: Ensure you have switched back to Python mode (the \py command) before
proceeding.
The next step is to update a document.
Updating Documents
Updating documents is a little more complicated than updating a row in an SQL table.
The reason is that it is necessary to specify what part of the document to update and
that values are not necessarily scalar values.
There are three methods to change documents:
add_or_replace_one(): For upserting a document by specifying the document
ID. If there is already a document with the ID, then it is replaced, otherwise a
new document is added with the ID.
modify(): The most advanced method where you can modify one or more
documents. It can take most of the arguments known from find queries to
specify which documents to change, and additional there are methods to specify
which change to make.
replace_one(): Similar to add_or_replace_one() but if the document ID does
not already exist, then no change occurs.
The method that will be demonstrated here is the modify() method. As mentioned,
you need to use a sub method to specify how to make the modification of the matching
documents. The available modify methods are:
array_append(): Takes an existing array element and changes it to include the
original value plus the new value(s). For example if the original array is ["One",
"Two", "Three"] and you want to append [1, "Uno"] to the element with the
value "one", the resulting array is [["One", [1, "Uno"]], "Two", "Three"].
array_insert(): Takes an existing array and inserts into the element position
specified. For example, if the existing document is {"fruits" = ["apple",
"orange", "watermelon"]} and the array is specified as fruits[1] with the
new element being kiwi, then the resulting document is {"fruits" =
["apple", "kiwi", "orange", "watermelon"]}.
set: If the specified element exists, update it, otherwise add the element.
patch: Takes a part of a document and replaces it with a new document part.
This supports adding, replacing, or removing parts of a document. This is a very
powerful yet simple method.
unset: Removes an element from the document.
Note: It is mandatory to set a filter condition when calling modify(). If you fail to do
this, MySQL raises an ArgumentError (in MySQL Shell in Python mode) or a TypeError
(in Connector/Python) exception.
If you really want to modify all documents, specify "True" or similar as the condition;
this also makes it clear when you look at the code later on that you meant to match all
documents.
Let’s look at two examples: using set() to add/replace a given field and patch() to
add, change, and remove fields.
Modifying Documents with set()
In this example the class of the cat will be changed to “Mammalia” (from the current
“Mammal”). This requires a filter to find documents with Name set to Cat and then
specify the new value for Class:
session.start_transaction()
stmt = animals.modify("Name = :name").set("Class", "Mammalia")
result = stmt.bind("name", "Cat").execute()
print("Number of documents changed: {0}".format(
result.get_affected_items_count()))
session.commit()
Script: modify_set.py
The filter is set in the same way as when find() was used to read data. The output will
show that 1 document has been updated.
Modifying Documents with patch()
The patch() method is the most powerful of the modify methods, however it is also
simple. The method takes a JSON document (which may be just part of the document
you want to modify). For each element in the provided document, the element will be
added, modified, or removed depending on whether the element exists and what the
new value is.
As an example consider the animals collection. The existing value for the document with
Name = Dog is:
mysql-py> animals.find("Name = 'Dog'")
[
{
"Claim": "Man's best friend",
"Class": "Mammal",
"Name": "Dog",
"Species": [
"German Shephard",
"Labrador Retriever",
"Poodle"
],
"_id": "00005b95a8cc000000000000000d"
}
]
1 document in set (0.0008 sec)
The patch that will be provided is:
dog_patch = {
"Class": "Mammalia",
"Lifespan": {
"From": 10,
"To": 15,
"Record": 27
},
"Claim": None,
}
Script: modify_patch.py
Tip: You can also write the patch document as a string. In that case, you need to specify
the lack of a value as null instead of None.
You can pass this to the patch() method:
session.start_transaction()
patch_stmt = animals.modify("Name = :name").patch(dog_patch)
patch_result = patch_stmt.bind("name", "Dog").execute()
print("Number of documents changed: {0}".format(
patch_result.get_affected_items_count()))
session.commit()
Script: modify_patch.py
The new dog document is:
mysql-py> animals.find("Name = 'Dog'")
[
{
"Class": "Mammalia",
"Lifespan": {
"From": 10,
"Record": 27,
"To": 15
},
"Name": "Dog",
"Species": [
"German Shephard",
"Labrador Retriever",
"Poodle"
],
"_id": "00005b95a8cc000000000000000d"
}
]
1 document in set (0.00312 sec)
Script: modify_patch.py
So, the class has been updated, the lifespan added, and the claim removed.
The final CRUD operation available is delete.
Deleting Documents
There are two methods to remove documents: remove()and remove_one(). The
remove() method finds documents similar to how the find() method does whereas
remove_one() finds a single document by the ID.
As an example, delete the first document sorted alphabetically by name where the class
is “Mammalia”:
session.start_transaction()
stmt = animals.remove("Class = :class").sort("Name").limit(1)
result = stmt.bind("class", "Mammalia").execute()
print("Number of documents removed: {0}".format(
result.get_affected_items_count()))
session.commit()
Script: remove.py
This removes the cat document. As for the modify() method, it is mandatory so specify
a filter condition with the remove() method; again you can set the filter to "True" if
you really want to delete all documents.
This concludes the Python part of the lab. However, what about the other languages? A
nice thing about the X DevAPI is that it is relatively easy to learn to use it in another
language. Let’s finish off by looking at that.
Comparing Connectors
The last section will implement the same query in Connector/Python,
Connector/Node.js, and Connector/J. It will query the countryinfo collection in the
world_x database and retrieve the information about the document with the document
ID set to USA.
Tip: If you have been experimenting with the world_x schema and want to reset it, you
can do it with the reset_world_x.sh script.
To make it easier to compare the examples directly, the connection arguments are
included directly in the examples. This is not recommended for real programs!
Important: Always store the connection arguments and particularly the password
outside of the application source code.
For each example, the output is the same:
_id ............: USA
Name ...........: United States
Geography:
Continent ......: North America
Region .........: North America
Surface Area ...: 9363520
MySQL Connector/Python
The MySQL Connector/Python example will be familiar after trying the examples
previously in the lab.
#!/usr/bin/env python
import mysqlx
connect_args = {
"user": "hol1703",
"password": "hol@OOW18",
"host": "localhost",
"port": 33060,
}
session = mysqlx.get_session(**connect_args)
schema = session.get_schema("world_x")
countryinfo = schema.get_collection("countryinfo")
stmt = countryinfo.find("_id = :country_code")
stmt.fields("_id", "Name", "geography")
stmt.bind("country_code", "USA")
result = stmt.execute()
usa = result.fetch_one()
id = usa["_id"]
name = usa["Name"]
continent = usa["geography"]["Continent"]
region = usa["geography"]["Region"]
surface_area = usa["geography"]["SurfaceArea"]
print("_id ............: {0}".format(id))
print("Name ...........: {0}".format(name))
print("\nGeography:")
print("Continent ......: {0}".format(continent))
print("Region .........: {0}".format(region))
print("Surface Area ...: {0}".format(surface_area))
session.close()
Script: countryinfo.py
MySQL Connector/Node.js
Tip: HOL1706 on Tuesday, Oct 23, 11:15 a.m. - 12:15 p.m. is dedicated to
Connector/Node.js and the Document Store.
Since Node.js uses asynchronous execution, the code is a bit different, but should still
look familiar.
#!/usr/bin/env node
'use strict';
const mysqlx = require('@mysql/xdevapi');
const mysqlArgs = {
host: 'localhost',
port: 33060,
password: 'hol@OOW18',
user: 'hol1703',
};
(async function() {
let session;
let docs = [];
function storeResult(doc) {
docs.push(doc);
}
try {
session = await mysqlx.getSession(mysqlArgs);
const schema = session.getSchema('world_x');
const countryinfo = schema.getCollection('countryinfo');
const stmt = countryinfo.find('_id = :country_code')
.fields('_id', 'Name', 'geography');
stmt.bind('country_code', 'USA');
const result = await stmt.execute(storeResult);
const usa = docs[0];
const id = usa['_id'];
const name = usa['Name'];
const continent = usa['geography']['Continent'];
const region = usa['geography']['Region'];
const surfaceArea = usa['geography']['SurfaceArea'];
console.log(`_id ............: ${id}`);
console.log(`Name ...........: ${name}`);
console.log('\nGeography:');
console.log(`Continent ......: ${continent}`);
console.log(`Region .........: ${region}`);
console.log(`Surface Area ...: ${surfaceArea}`);
} catch (err) {
console.error(err.message);
} finally {
session && await session.close();
}
})();
Script: countryinfo.js
MySQL Connector/J
The main difference when using Connector/J is that Java is a strongly typed language, so
it is necessary to explicitly declare each variable. The example also passes the
connection arguments as an URI instead of a JSON object. An alternative would be to
use java.util.Properties.
import com.mysql.cj.xdevapi.Session;
import com.mysql.cj.xdevapi.SessionFactory;
import com.mysql.cj.xdevapi.Schema;
import com.mysql.cj.xdevapi.Collection;
import com.mysql.cj.xdevapi.FindStatement;
import com.mysql.cj.xdevapi.DocResult;
import com.mysql.cj.xdevapi.DbDoc;
import com.mysql.cj.xdevapi.JsonParser;
import com.mysql.cj.xdevapi.JsonValue;
import com.mysql.cj.xdevapi.JsonNumber;
import com.mysql.cj.xdevapi.JsonString;
public class countryinfo {
public static void main(String[] args) {
Session session = new
SessionFactory().getSession("mysqlx://localhost:33060?user=hol1703&pass
word=hol@OOW18");
Schema schema = session.getSchema("world_x");
Collection countryinfo = schema.getCollection("countryinfo");
FindStatement stmt = countryinfo.find("_id = :country_code");
stmt.fields("_id AS _id, Name AS Name, geography AS
geography");
stmt.bind("country_code", "USA");
DocResult result = stmt.execute();
DbDoc usa = result.fetchOne();
String id = ((JsonString) usa.get("_id")).getString();
String name = ((JsonString) usa.get("Name")).getString();
DbDoc geo = (DbDoc) usa.get("geography");
String continent = ((JsonString)
geo.get("Continent")).getString();
String region = ((JsonString) geo.get("Region")).getString();
Integer surfaceArea = ((JsonNumber)
geo.get("SurfaceArea")).getInteger();
System.out.println("_id ............: " + id);
System.out.println("Name ...........: " + name);
System.out.println("\nGeography:");
System.out.println("Continent ......: " + continent);
System.out.println("Region .........: " + region);
System.out.println("Surface Area ...: " + surfaceArea);
session.close();
}
}
Script: countryinfo.java
You can compile and execute the program like (both commands as a single line):
shell$ cd ~/bin
shell$ javac -classpath /usr/share/java/mysql-connector-java-8.0.12.jar
countryinfo.java
shell$ java -classpath /usr/share/java/mysql-connector-java-
8.0.12.jar:/usr/share/java/protobuf.jar:. countryinfo
This concludes the guided part of the lab. You are encouraged to continue to playing
with the MySQL Document Store and try MySQL Connector/Node.js and MySQL
Connector/J in more details.
References
If you want to learn more about the MySQL Document Store, the following references
are useful:
X DevAPI User Guide: https://dev.mysql.com/doc/x-devapi-userguide/en/
MySQL Connector/Python X DevAPI Reference Documentation:
https://dev.mysql.com/doc/dev/connector-python/8.0/
MySQL Connector/Node.js with X DevAPI:
https://dev.mysql.com/doc/dev/connector-nodejs/8.0/
MySQL Shell Documentation – Python API:
https://dev.mysql.com/doc/dev/mysqlsh-api-python/8.0/
MySQL Shell Documentation – JavaScript API:
https://dev.mysql.com/doc/dev/mysqlsh-api-javascript/8.0/
MySQL Connector/J X DevAPI Reference:
https://dev.mysql.com/doc/dev/connector-j/8.0/
MySQL Shell 8.0 Manual: https://dev.mysql.com/doc/mysql-shell/8.0/en/