16
Sergei Petrunia MariaDB Engine Independent Table Statistics including Histograms MySQL User Group NL Meetup Oct, 12th 2015

MariaDB: Engine Independent Table Statistics, including histograms

Embed Size (px)

Citation preview

Page 1: MariaDB: Engine Independent Table Statistics, including histograms

Sergei PetruniaMariaDB

Engine Independent Table Statisticsincluding Histograms

MySQL User Group NLMeetupOct, 12th 2015

Page 2: MariaDB: Engine Independent Table Statistics, including histograms

2 12:42

Background: statistics

Query optimization

● Rule-based

● Cost-based. Relies on

− Statistics

− Cost model

Page 3: MariaDB: Engine Independent Table Statistics, including histograms

3 12:42

Table statistics in MySQL (MariaDB < 10.0)

1. #rows in the table

2. #rows in a given index range (e.g. tbl.key < 123)

3. Index statistics: #rows that match tbl.key=const

• e.g. for orders.customer_id=... we get

AVG(#orders for customer)

• Basis for join optimization

• ANALYZE collects this

Page 4: MariaDB: Engine Independent Table Statistics, including histograms

4 12:42

Issues with statistics

● Issue #1: index statistics is imprecise/varying

− InnoDB collects stats using sampling

− innodb_stats_persistent (ON since 5.6)

− Still, can vary widely

● Issue #2: not enough statistics

− tbl.non_indexed_col IS [NOT] NULL

− tbl.non_indexed_col BETWEEN 10 AND 20

Page 5: MariaDB: Engine Independent Table Statistics, including histograms

5 12:42

JOINs need column statistics

select * from order join customer on order.cust_id = customer.cust_id join supplier on order.order_id=supplier.order_idwhere order.priority='high' and order.total_price > 1K and customer.status='vip' and customer.country='Germany' and supplier.industry='electronics' and supplier.country='Finland'

Page 6: MariaDB: Engine Independent Table Statistics, including histograms

6 12:42

Solution: EITS

EITS = Engine Independent Table Statistics

● mysql.table_stats

− #rows in table

● mysql.index_stats

− Index cardinality for each prefix. Gives AVG(#rows for key value)

● mysql.column_stats

− MIN value, MAX value

− Fraction of NULL values

− #different values

− Histogram

EITS = Engine Independent Table Statistics

● mysql.table_stats

− #rows in table

● mysql.index_stats

− Index cardinality for each prefix. Gives AVG(#rows for key value)

● mysql.column_stats

− MIN value, MAX value

− Fraction of NULL values

− #different values

− Histogram

Provides estimates for range conds

− non_key_col > 'foo'

− non_key_col=1234

− non_key_col IS [NOT] NULL

Page 7: MariaDB: Engine Independent Table Statistics, including histograms

7 12:42

Colletecting EITS statistics

● Disabled by default

● Must be collected manually (ANALYZE TABLE)

− Takes a table/index scan

set histogram_size=200; // if you want histograms (you do)

analyze table tbl persistent for columns (col1, col2, ...) indexes (idx1, idx2, ...);

analyze table tbl persistent for all;

set use_stat_tables='preferably';analyze table tbl;

Page 8: MariaDB: Engine Independent Table Statistics, including histograms

8 12:42

Collecting EITS statistics

● Can also modify statistics directly

set histogram_size=200;set use_stat_tables='preferably'analyze table orders;

+------------------+---------+----------+-----------------------------------------+| Table | Op | Msg_type | Msg_text |+------------------+---------+----------+-----------------------------------------+| dbt3sf1.orders | analyze | status | Engine-independent statistics collected || dbt3sf1.orders | analyze | status | OK |+------------------+---------+----------+-----------------------------------------+

insert into mysql.column_stats values(...);flush table ...;

Page 9: MariaDB: Engine Independent Table Statistics, including histograms

9 12:42

Enabling use of EITS statistics

● Statistics use not enabled by default

set use_stat_tables='preferably'; // or 'complementary'

set optimizer_use_condition_selectivity=4; // 1..5

● Can enable globally or per-session

− Or even per-query: set var=value query.

Page 10: MariaDB: Engine Independent Table Statistics, including histograms

10 12:42

New statistics test run

select *from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000

+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+|id|select_type|table |type|possible_keys|key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+|1 |SIMPLE |orders |ALL |PRIMARY |NULL |NULL |NULL |1494230| 100.00 |Using where||1 |SIMPLE |lineitem|ref |PRIMARY,i_...|PRIMARY|4 |orders.o_orderkey|2 | 100.00 |Using where|+--+-----------+--------+----+-------------+-------+-------+-----------------+-------+--------+-----------+

● 4.2 seconds● filtered=100%

− Close to truth for o_orderdate between ...− Far from truth for l_extendedprice > 1000000− In 10.1, can use “ANALYZE statement” to check this

Page 11: MariaDB: Engine Independent Table Statistics, including histograms

11 12:42

New statistics test run (2)

set histogram_size=200;set use_stat_tables='preferably'analyze table lineitem, orders;+------------------+---------+----------+-----------------------------------------+| Table | Op | Msg_type | Msg_text |+------------------+---------+----------+-----------------------------------------+| dbt3sf1.lineitem | analyze | status | Engine-independent statistics collected || dbt3sf1.lineitem | analyze | status | OK || dbt3sf1.orders | analyze | status | Engine-independent statistics collected || dbt3sf1.orders | analyze | status | OK |+------------------+---------+----------+-----------------------------------------+

set optimizer_use_condition_selectivity=4; .

● Collect table statistics

● Make the optimizer use it

Page 12: MariaDB: Engine Independent Table Statistics, including histograms

12 12:42

New statistics test run (3)

+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+|id|select_type|table |type |possible_keys|key |key_len|ref |rows |filtered|Extra |+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+|1 |SIMPLE |lineitem|ALL |PRIMARY,i_...|NULL |NULL |NULL |6001215| 0.50 |Using where||1 |SIMPLE |orders |eq_ref|PRIMARY |PRIMARY|4 |lineitem.l_orderkey|1 | 99.50 |Using where|+--+-----------+--------+------+-------------+-------+-------+-------------------+-------+--------+-----------+

select *from lineitem, orders where o_orderkey=l_orderkey and o_orderdate between '1990-01-01' and '1998-12-06' and l_extendedprice > 1000000

● Re-run the query

● lineitem.filtered=0.5% -

● 1.5 sec (from 4.2 sec)

− Can be much more for many-table joins.

l_extendedprice > 1000000

Page 13: MariaDB: Engine Independent Table Statistics, including histograms

13 12:42

Histogram properties

● Histograms are Height-balanced Histograms

Width-balanced Height-balanced

Page 14: MariaDB: Engine Independent Table Statistics, including histograms

14 12:42

Histogram properties

● Good for continuous, densely populated domains

− DATE[TIME], sequential identifiers, prices, counts, ...

● Not as good for sparse domains

− VARCHAR(100) CHARSET UTF8

● Not as good for highly-skewed domains

− List of popular items would work better

− Should still provide an estimate that's better than no estimate

set histogram_size=256, histogram_type='single_prec_hb';set histogram_size=128, histogram_type='double_prec_hb';

● Can try a different histogram settings:

Page 15: MariaDB: Engine Independent Table Statistics, including histograms

15 12:42

EITS summary

● New kind of statistics in MariaDB 10.0

− Complements InnoDB's statistics

● Must be collected manually− set histogram_size=255;

− analyze table tbl persistent for all;

● Must be enabled to be used (safe!)− set optimizer_use_stat_tables='preferably';

− set optimizer_use_condition_selectivity=4;

● Please report your experience!

Page 16: MariaDB: Engine Independent Table Statistics, including histograms

16 12:42

Thanks!