Upload
lyhanh
View
229
Download
0
Embed Size (px)
Citation preview
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Manyi Lu, Olav Sandstaa MySQL Optimizer Team, Oracle September, 2015
MySQL 5.7: What’s New in the Optimizer?
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL Optimizer
SELECT a, b FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c AND t2.d > 20 AND t2.d < 30;
MySQL Server
Cost based optimizations
Heuristics
Cost Model
Op
tim
izer
Table/index info (data dictionary)
Statistics (storage engines)
t2 t3
t1
Table scan
Range scan
Ref access
JOIN
JOIN
Pars
er
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL Optimizer: Design Principles
• Best out of the box performance
• Easy to use, minimum tuning needed
• When you need to understand: explain and trace
• Flexibility through optimizer hints, switches, and plugins
• Fast evolving
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Optimizer Improvements
• Generated columns and functional indexes
• New JSON datatype and functions
• Improved cost model
• New hint syntax and improved hint support
• Query rewrite plugin
• UNION ALL queries do not always use temporary tables
• Improved optimizations for queries with IN expressions
• Explain on a running query
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Generated Columns
• Column generated from the expression
• VIRTUAL: computed when read, not stored, indexable
• STORED: computed when inserted/updated, stored in SE, indexable
• Useful for: – Functional index
– Materialized cache for complex conditions
– Simplify query expression
CREATE TABLE order_lines (orderno integer, lineno integer, price decimal(10,2), qty integer, sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) STORED );
Kodus to Andrey Zhakov for his contribution!
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Functional Index
• Online index creation
• Composite index on a mix of ordinary, virtual and stored columns
CREATE TABLE order_lines (orderno integer, lineno integer, price decimal(10,2), qty integer, sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) VIRTUAL); ALTER TABLE order_lines ADD INDEX idx (sum_price);
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Generated column: STORED vs VIRTUAL
• STORED –Requires table rebuild at creation
–Updating table data at INSERT/UPDATE
–Fast retrieval
9
• VIRTUAL –Metadata change only, instant
–Faster INSERT/UPDATE, no change to table
–Compute when read
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Indexing Generated column: STORED vs VIRTUAL (InnoDB)
• STORED –Primary & secondary index
–B-TREE, Full text search, R-TREE
–Duplication of data in base table and index
– Independent of SE
–Online operation
10
• VIRTUAL –Secondary index only
–B-TREE only
–Less storage
–Requires SE support
–Online operation
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
JSON Support
• Seemless integration of relational and schema-less data
• Leverage existing database infrastructure for new applications
• Provide a native JSON datatype
• Provide a set of built-in JSON functions
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
New JSON Datatype
• You can always store JSON data in TEXT or VARCHAR column, but
• native JSON datatype has many benefits: – Validation on insert
– Efficient access
CREATE TABLE employees (data JSON); INSERT INTO employees VALUES ('{"id": 1, "name": "Jane"}'); INSERT INTO employees VALUES ('{"id": 2, "name": "Joe"}');
SELECT * FROM employees; +-------------------------------------+ | data | +-------------------------------------+ | {"id": 1, "name": "Jane"} | | {"id": 2, "name": "Joe"} | +-------------------------------------+ 2 rows in set (0,00 sec)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
New functions to handle JSON data • Info
– JSON_VALID()
– JSON_TYPE()
– JSON_KEYS()
– JSON_LENGTH()
– JSON_DEPTH()
– JSON_CONTAINS_PATH()
– JSON_CONTAINS()
14
• Modify
– JSON_REMOVE()
– JSON_ARRAY_APPEND()
– JSON_SET()
– JSON_INSERT()
– JSON_ARRAY_INSERT()
– JSON_REPLACE()
• Create
– JSON_MERGE()
– JSON_ARRAY()
– JSON_OBJECT()
• Get data
– JSON_EXTRACT()
– JSON_SEARCH()
• Helper
– JSON_QUOTE()
– JSON_UNQUOTE()
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Inlining JSON PATH Expressions in SQL
[[database.]table.]column->”$<path spec>” (GA version)
SELECT * FROM employees WHERE data->”$.name” = “Jane"; Is a short hand for SELECT * FROM employees WHERE JSON_EXTRACT(data, “$.name” ) = “Jane”;
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Indexing JSON data
• Use Functional Indexes,
• STORED and VIRTUAL types are supported
16
CREATE TABLE employees (data JSON); ALTER TABLE employees ADD COLUMN name VARCHAR(30) AS (JSON_UNQUOTE(data->”$.name”)) VIRTUAL, ADD INDEX name_idx (name);
• Functional index approach
• Use JSON_EXTRACT to specify field to be indexed
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Using Real Life Data
• Via SF OpenData
• 206K JSON objects representing subdivision parcels.
• Imported from https://github.com/zemirco/sf-city-lots-json + small tweaks
17
CREATE TABLE features ( id INTEGER NOT NULL auto_increment primary key, feature JSON NOT NULL );
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 18
{
"type":"Feature",
"geometry":{
"type":"Polygon",
"coordinates":[
[
[-122.42200352825247,37.80848009696725,0],
[-122.42207601332528,37.808835019815085,0],
[-122.42110217434865,37.808803534992904,0],
[-122.42106256906727,37.80860105681814,0],
[-122.42200352825247,37.80848009696725,0]
]
]
},
"properties":{
"TO_ST":"0",
"BLKLOT":"0001001",
"STREET":"UNKNOWN",
"FROM_ST":"0",
"LOT_NUM":"001",
"ST_TYPE":null,
"ODD_EVEN":"E",
"BLOCK_NUM":"0001",
"MAPBLKLOT":"0001001"
}
}
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Naive Performance Comparison
19
# as JSON type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +------------------+ | json_extract | +------------------+ | "Feature" | +-----------------+ 1 row in set (1.25 sec)
Unindexed traversal of 206K documents
# as TEXT type SELECT DISTINCT feature->"$.type" as json_extract FROM features; +------------------+ | json_extract | +------------------+ | "Feature" | +------------------+ 1 row in set (12.85 sec)
Explanation: Binary format of JSON type is very efficient at searching. Storing as TEXT performs over 10x worse at traversal.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Create Index
20
ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type") VIRTUAL; Query OK, 0 rows affected (0.01 sec) Records: 0 Duplicates: 0 Warnings: 0 ALTER TABLE features ADD INDEX (feature_type); Query OK, 0 rows affected (0.73 sec) Records: 0 Duplicates: 0 Warnings: 0 SELECT DISTINCT feature_type FROM features; +-------------------+ | feature_type | +-------------------+ | "Feature" | +-------------------+ 1 row in set (0.06 sec)
From table scan on 206K documents to index scan on 206K materialized values
Down from 1.25 sec to 0.06 sec
Creates index online. Does not modify table rows.
Meta data change only (FAST). Does not need to touch table.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
JSON Path Search
• Provides a novice way to know the path. To retrieve via: [[database.]table.]column->"$<path spec>" (GA VERSION ONLY)
21
SELECT JSON_SEARCH(feature,'one', 'MARKET') AS extract_path FROM features WHERE id = 121254; +--------------------------------+ | extract_path | +--------------------------------+ | "$.properties.STREET" | +--------------------------------+ 1 row in set (0.00 sec)
SELECT feature->"$.properties.STREET" AS property_street FROM features WHERE id = 121254; +-----------------------+ | property_street | +-----------------------+ | "MARKET" | +-----------------------+ 1 row in set (0.00 sec)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
JSON Array Creation
22
SELECT JSON_ARRAY(id, feature->"$.properties.STREET", feature->"$.type") AS json_array FROM features ORDER BY RAND() LIMIT 3; +------------------------------------------+ | json_array | +------------------------------------------+ | [65298, "10TH", "Feature"] | | [122985, "08TH", "Feature"] | | [172884, "CURTIS", "Feature"] | +------------------------------------------+ 3 rows in set (2.66 sec)
Evaluates a (possibly empty) list of values and returns a JSON array containing those values
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Motivation for Changing the Cost Model
• Adopt to new hardware architectures
– SSD, larger memories, caches
• Allows storage engines to provide accurate and dynamic cost estimate
– Whether the data is in RAM, SSD, HDD?
• More maintainable cost model implementation – Avoid hard coded constants
– Refactoring of existing cost model code
• Tunable/configurable
• Replace heuristics with cost based decisions
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Cost Model: Main Focus in 5.7
Address the following pain points in current cost model:
• Inaccurate record estimation for JOIN
Solution: condition filtering
• Hard-coded cost constants
Solution: adjustable cost parameters
• Imprecise cardinality/records per key estimates from SE
Solution: Integer value replaced by floating point
• Hard to obtain detailed cost numbers
Solution: Added to JSON explain and MySQL WorkBench
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.6: Record Estimates for JOIN
• t1 JOIN t2
• Total cost = cost (access method t1) + Prefix_rows_t1 * cost (access method t2)
• Prefix_rows_t1 is records read by t1
– Overestimation if where conditions apply!->Suboptimial join order
Without condition filtering
t1 t2
Acc
ess
Met
ho
d
Prefix_rows_t1 Number of records read
from t1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Improved Record Estimates for JOIN
• t1 JOIN t2
• Prefix_rows_t1 Takes into account the entire query condition
– More accurate record estimate -> improved JOIN order
Condition filter
t1 t2
Acc
ess
Met
ho
d
Number of records read
from t1
Co
nd
itio
n f
ilter
Prefix_rows_t1 Records passing the table
conditions on t1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• 10 000 rows in the emp table • 100 rows in the office table • 100 rows with first_name=”John” AND hire_date BETWEEN “2014-01-
01″ AND “2014-06-01″
MySQL 5.7 Improved Record Estimates for JOIN
CREATE TABLE emp ( id INTEGER NOT NULL PRIMARY KEY, office_id INTEGER NOT NULL, first_name VARCHAR(20), hire_date DATE NOT NULL, KEY office (office_id) ) ENGINE=InnoDB;
CREATE TABLE office ( id INTEGER NOT NULL PRIMARY KEY, officename VARCHAR(20) ) ENGINE=InnoDB;
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Table Type Possible keys Key Ref Rows Filtered Extra
office ALL PRIMARY NULL NULL 100 100.00 NULL
employee ref office office office.id 99 100.00 Using where
MySQL 5.7 Improved Record Estimates for JOIN
Explain for 5.6: Total Cost = cost(scan office) + 100 * cost(ref_access emp)
Explain for 5.7: Total Cost = cost(scan emp) + 9991*1.23% * cost(eq_ref_access office)
SELECT office_name
FROM office JOIN employee ON office.id = employee.office
WHERE employee.name LIKE “John” AND
hire_date BETWEEN “2014-01-01” AND “2014-06-01”;
Table Type Possible keys Key Ref Rows Filtered Extra
employee ALL NULL NULL NULL 9991 1.23 NULL
office eq_ref PRIMARY PRIMARY employee.office 1 100.00 Using where JOIN ORDER HAS CHANGED!
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Improved HINTs
• Introduced new hint syntax /*+ ...*/
– Flexibility over optimizer switch, effect individual statement only
– Hints within one statement take preceduence over optimizer switch
– Hints apply at different scope levels: global, query block, table, index
• Extended hint support
– BKA, BNL, MRR, ICP, SEMIJOIN, SUBQUERY,MAX_EXECUTION_TIME++
– Disabling prevents optimizer to use it
– Enabling means optimizer is free to use it, but is not forced to use it
• Will gradually replace the old hint syntax in upcoming releases
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Hint Example: MAX_EXECUTION_TIME
SELECT /*+ MAX_EXECUTION_TIME(1) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1; ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1; +---+-----+----+------+----+-----+----+------+---+------+ | a | b | a | b | a | b | a | b | a | b | +---+-----+----+------+----+-----+----+------+---+------+ | 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 | +---+------+---+------+----+-----+----+------+---+------+ 1 row in set (0,00 sec)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Hint Example: SEMIJOIN
EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT a FROM t3);
id Select_type Table Type Possible_keys Key Key_len Ref Rows Extra
1 simple t3 index a a 4 null 3 Using where; LooseScan
1 simple t2 ref a a 4 test.t3.a 1 Using index
EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT /*+ NO_SEMIJOIN() */ a FROM t3);
id Select_type Table Type Possible_keys Key Key_len Ref Rows Extra
1 primary t2 index null a 4 null 4 Using where; Using index
2 dependent subquery
t3 Index_ subquery
a a 4 func 1 Using index
No hint, optimizer chooses semi-join algorithm loosescan
Semi-join disabled with hint, subquery is executed for each row of outer table
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Hint Example: SEMIJOIN
EXPLAIN SELECT /*+ SEMIJOIN(@subq MATERIALIZATION) */ * FROM t2 WHERE t2.a IN (SELECT /*+ QB_NAME(subq) */ a FROM t3);
id Select_type Table Type Possible_keys
Key Key_len Ref Rows Extra
1 simple t2 index a a 4 null 4 Using where;using index
1 simple <subquery2> eq_ref <auto_key> <auto_key> 4 test.t2.a 1 null
2 materialized t3 index a a 4 null 3 Using index 3 rows in set, 1 warning (0.01 sec)
Hint on a particular algorithm, in this case semi-join materialization
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Why Query Rewrite Plugin?
• Problem
– Optimizer choses a suboptimal plan
– Users can change the query plan by adding hints or rewrite the query
– However, database application code cannot be changed
• Solution: query rewrite plugin!
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Query Rewrite Plugin
• New pre and post parse query rewrite APIs
– Users can write their own plug-ins
• Provides a post-parse query plugin
– Rewrite problematic queries without the need to make application changes • Add hints
• Modify join order
• Rewrite rules are defined in a table
• Improve problematic queries from ORMs, third party apps, etc
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 How Rewrites Happen?
For query
SELECT * FROM t1 JOIN t2 ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2
Replace parameter markers in Replacement with actual literals:
Pattern is:
SELECT *
FROM t1 JOIN t2 ON t1.keycol =
t2.keycol
WHERE col1 = ? AND col2 = ?
Replacement is:
SELECT a, b, c
FROM t1 STRAIGHT_JOIN t2 FORCE
INDEX (col1)
ON t1.keycol = t2.keycol
WHERE col1 = ? AND col2 = ?
SELECT a, b, c
FROM t1 STRAIGHT_JOIN t2 FORCE INDEX (col1)
ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7Query Rewrite Plugin cont. query_rewrite.rewrite_rules table:
Pattern Pattern_ database
Replacement Enabl
ed Message
SELECT name, department_name FROM employee JOIN department USING ( department_id ) WHERE salary > ?
employees
SELECT name, department_name FROM employee STRAIGHT_JOIN department USING ( department_id ) WHERE salary > ?
Y NULL
SELECT name, department_name FROXM employee JOIN department USING ( department_id ) WHERE salary > ?
employees
SELECT name, department_name FROM employee STRAIGHT JOIN department USING ( department_id ) WHERE salary > ?
N
Parse error in pattern:……near ……at line 1
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Query Rewrite Plugin: Performance Impact
What is the Cost of Rewriting queries?
• Designed for rewriting problematic queries only!
• ~ Zero cost for queries not to be rewritten
– Statement digest computed for performance schema anyway
• Cost of queries to be rewritten is insignificant compared to performance gain
– Cost of generating query + reparsing max ~5% performance overhead
– Performance gain potentially x times
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
User Requested
Performance Improvements
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Avoid Creating Temporary Table for UNION ALL
SELECT * FROM table_a UNION ALL SELECT * FROM table_b;
• 5.6: Always materialize results of UNION ALL in temporary tables
• 5.7: Do not materialize in temporary tables unless used for sorting, rows are sent directly to client
• 5.7: Client will receive the first row faster, no need to wait until the last query block is finished
• 5.7: Less memory and disk consumption
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Optimizations for IN Expressions
• 5.6: Certain queries with IN predicates can’t use index scans or range scans even though all the columns in the query are indexed.
• 5.6: Range optimizer ignores lists of rows
• 5.6: Needs to rewrite to de-normalized form
SELECT a, b FROM t1 WHERE ( a = 0 AND b = 0 ) OR ( a = 1 AND b = 1 )
• 5.7: IN queries with row value expressions executed using range scans.
• 5.7: Explain output: Index/table scans change to range scans
CREATE TABLE t1 (a INT, b INT, c INT, KEY x(a, b)); SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7: Optimizations for IN Expressions
• A table has 10 000 rows, 2 match the where condition
MySQL 5.6:
**************1. row *****************
select_type: SIMPLE
table: t1
type: index
key: x
key_len: 10
ref: NULL
rows: 10 000
Extra: Using where; Using index
MySQL 5.7:
*************1. row *****************
select_type: SIMPLE
table: t1
type: range
key: x
key_len: 10
ref: NULL
rows: 2
Extra: Using where; Using index
SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
MySQL 5.7 Explain on a Running Query
EXPLAIN [FORMAT=(JSON|TRADITIONAL)] FOR CONNECTION <id>;
• Shows query plan on connection <id>
• Useful for diagnostic on long running queries
• Plan isn’t available when query plan is under creation
• Applicable to SELECT/INSERT/DELETE/UPDATE
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
What is on Our Roadmap?
• Improve prepared statement support
• Add histogram
• Support parallel queries
• Common table expression(WITH RECURSIVE)
• Windowing functions