Upload
sergey-petrunya
View
372
Download
0
Embed Size (px)
Citation preview
Sergei PetruniaMariaDB
Engine Independent Table Statisticsincluding Histograms
MySQL User Group NLMeetupOct, 12th 2015
2 12:42
Background: statistics
Query optimization
● Rule-based
● Cost-based. Relies on
− Statistics
− Cost model
3 12:42
Table statistics in MySQL (MariaDB < 10.0)
1. #rows in the table
2. #rows in a given index range (e.g. tbl.key < 123)
3. Index statistics: #rows that match tbl.key=const
• e.g. for orders.customer_id=... we get
AVG(#orders for customer)
• Basis for join optimization
• ANALYZE collects this
4 12:42
Issues with statistics
● Issue #1: index statistics is imprecise/varying
− InnoDB collects stats using sampling
− innodb_stats_persistent (ON since 5.6)
− Still, can vary widely
● Issue #2: not enough statistics
− tbl.non_indexed_col IS [NOT] NULL
− tbl.non_indexed_col BETWEEN 10 AND 20
5 12:42
JOINs need column statistics
select * from order join customer on order.cust_id = customer.cust_id join supplier on order.order_id=supplier.order_idwhere order.priority='high' and order.total_price > 1K and customer.status='vip' and customer.country='Germany' and supplier.industry='electronics' and supplier.country='Finland'
6 12:42
Solution: EITS
EITS = Engine Independent Table Statistics
● mysql.table_stats
− #rows in table
● mysql.index_stats
− Index cardinality for each prefix. Gives AVG(#rows for key value)
● mysql.column_stats
− MIN value, MAX value
− Fraction of NULL values
− #different values
− Histogram
EITS = Engine Independent Table Statistics
● mysql.table_stats
− #rows in table
● mysql.index_stats
− Index cardinality for each prefix. Gives AVG(#rows for key value)
● mysql.column_stats
− MIN value, MAX value
− Fraction of NULL values
− #different values
− Histogram
Provides estimates for range conds
− non_key_col > 'foo'
− non_key_col=1234
− non_key_col IS [NOT] NULL
7 12:42
Colletecting EITS statistics
● Disabled by default
● Must be collected manually (ANALYZE TABLE)
− Takes a table/index scan
set histogram_size=200; // if you want histograms (you do)
analyze table tbl persistent for columns (col1, col2, ...) indexes (idx1, idx2, ...);
analyze table tbl persistent for all;
set use_stat_tables='preferably';analyze table tbl;
8 12:42
Collecting EITS statistics
● Can also modify statistics directly
set histogram_size=200;set use_stat_tables='preferably'analyze table orders;
+------------------+---------+----------+-----------------------------------------+| Table | Op | Msg_type | Msg_text |+------------------+---------+----------+-----------------------------------------+| dbt3sf1.orders | analyze | status | Engine-independent statistics collected || dbt3sf1.orders | analyze | status | OK |+------------------+---------+----------+-----------------------------------------+
insert into mysql.column_stats values(...);flush table ...;
9 12:42
Enabling use of EITS statistics
● Statistics use not enabled by default
set use_stat_tables='preferably'; // or 'complementary'
set optimizer_use_condition_selectivity=4; // 1..5
● Can enable globally or per-session
− Or even per-query: set var=value query.
10 12:42
New statistics test run
select *from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000
+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+|1 |SIMPLE |orders |ALL |PRIMARY |NULL |NULL |NULL |1494230| 100.00 |Using where||1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 | 100.00 |Using where|+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+
● 4.2 seconds● filtered=100%
− Close to truth for o_orderdate between ...− Far from truth for l_extendedprice > 1000000− In 10.1, can use “ANALYZE statement” to check this
11 12:42
New statistics test run (2)
set histogram_size=200;set use_stat_tables='preferably'analyze table lineitem, orders;+------------------+---------+----------+-----------------------------------------+| Table | Op | Msg_type | Msg_text |+------------------+---------+----------+-----------------------------------------+| dbt3sf1.lineitem | analyze | status | Engine-independent statistics collected || dbt3sf1.lineitem | analyze | status | OK || dbt3sf1.orders | analyze | status | Engine-independent statistics collected || dbt3sf1.orders | analyze | status | OK |+------------------+---------+----------+-----------------------------------------+
set optimizer_use_condition_selectivity=4; .
● Collect table statistics
● Make the optimizer use it
12 12:42
New statistics test run (3)
+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+|1 |SIMPLE |lineitem|ALL |PRIMARY,i_...|NULL |NULL |NULL |6001215| 0.50 |Using where||1 |SIMPLE |orders |eq_ref|PRIMARY |PRIMARY|4 |lineitem.l_orderkey|1 | 99.50 |Using where|+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+
select *from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000
● Re-run the query
● lineitem.filtered=0.5% -
● 1.5 sec (from 4.2 sec)
− Can be much more for many-table joins.
l_extendedprice > 1000000
13 12:42
Histogram properties
● Histograms are Height-balanced Histograms
Width-balanced Height-balanced
14 12:42
Histogram properties
● Good for continuous, densely populated domains
− DATE[TIME], sequential identifiers, prices, counts, ...
● Not as good for sparse domains
− VARCHAR(100) CHARSET UTF8
● Not as good for highly-skewed domains
− List of popular items would work better
− Should still provide an estimate that's better than no estimate
set histogram_size=256, histogram_type='single_prec_hb';set histogram_size=128, histogram_type='double_prec_hb';
● Can try a different histogram settings:
15 12:42
EITS summary
● New kind of statistics in MariaDB 10.0
− Complements InnoDB's statistics
● Must be collected manually− set histogram_size=255;
− analyze table tbl persistent for all;
● Must be enabled to be used (safe!)− set optimizer_use_stat_tables='preferably';
− set optimizer_use_condition_selectivity=4;
● Please report your experience!
16 12:42
Thanks!