164
Billion Goods in Few Categories How Histograms Save a Life? Sveta Smirnova Percona

Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Billion Goods in Few CategoriesHow Histograms Save a Life?

Sveta SmirnovaPercona

Page 2: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•Introduction•The Use Case

The Cardinality: Two LevelsExample

•Why the Difference?•Even Worse Use Case

ANALYZE TABLE LimitationsExample

•How Histograms Work?•Left Overs

Table of Contents

2

Page 3: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

The column statistics data dictionary table stores histogram statistics aboutcolumn values, for use by the optimizer in constructing query execution plans

MySQL User Reference Manual

Optimizer Statistics aka Histograms

3

Page 4: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• MySQL Support engineer• Author of• MySQL Troubleshooting• JSON UDF functions• FILTER clause for MySQL

• Speaker• Percona Live, OOW, Fosdem,

DevConf, HighLoad...

Sveta Smirnova

4

Page 5: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Introduction

Page 6: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Hardware•Wise options• Optimized queries• Brain

Everything can Be Resolved!

6

Page 7: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• This talk is about• How I spent the last three years• Resolving the same issue• For different customers

• Task was to speed up the query

Not Everything /

7

Page 8: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• This talk is about• How I spent the last three years• Resolving the same issue• For different customers

• Task was to speed up the query

Not Everything /

7

Page 9: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Specific data distribution

• Access on different fields• ON goods.shop id = shop.id• WHERE shop.location IN (...)• GROUP BY goods.category, shop.profile• ORDER BY shop.distance, goods.quantity

• Index cannot be used effectively

Not All the Queries Can be Optimized

8

Page 10: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Specific data distribution• Access on different fields• ON goods.shop id = shop.id• WHERE shop.location IN (...)• GROUP BY goods.category, shop.profile• ORDER BY shop.distance, goods.quantity

• Index cannot be used effectively

Not All the Queries Can be Optimized

8

Page 11: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Specific data distribution• Access on different fields• ON goods.shop id = shop.id• WHERE shop.location IN (...)• GROUP BY goods.category, shop.profile• ORDER BY shop.distance, goods.quantity

• Index cannot be used effectively

Not All the Queries Can be Optimized

8

Page 12: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Big difference between number of values

Red 1,000,000Green 2Blue 100,000

• Cardinality is not correct• Index maintenance is expensive• Optimizer does not work as we wish it

Examples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 13: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Constantly changing

Red 100,000Green 1,000,000Blue 10

• Cardinality is not correct• Index maintenance is expensive• Optimizer does not work as we wish it

Examples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 14: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Constantly changing

Red 1,000Green 2,000Blue 50,000

• Cardinality is not correct• Index maintenance is expensive• Optimizer does not work as we wish it

Examples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 15: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Cardinality is not correct• Was not updated in time• Updates too often• Calculated wrongly

• Index maintenance is expensive• Optimizer does not work as we wish it

Examples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 16: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Cardinality is not correct• Index maintenance is expensive• Hardware resources• Slow updates• Window to run CREATE INDEX

• Optimizer does not work as we wish itExamples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 17: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution varies• Cardinality is not correct• Index maintenance is expensive• Optimizer does not work as we wish it

Examples in my talk @Percona Live Frankfurt

Latest Support Tickets

9

Page 18: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Topic based on real Support cases• Couple of them are still in progress

• All examples are 100% fake• All examples are simplified• All disasters happened with version 5.7

Disclaimer

10

Page 19: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Topic based on real Support cases• All examples are 100% fake• They are created so that• No customer can be identified• Everything generated

Table namesColumn namesData

• Use case itself is fictional

• All examples are simplified• All disasters happened with version 5.7

Disclaimer

10

Page 20: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Topic based on real Support cases• All examples are 100% fake• All examples are simplified• Only columns, required to show the issue• Everything extra removed• Real tables usually store much more data

• All disasters happened with version 5.7

Disclaimer

10

Page 21: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Topic based on real Support cases• All examples are 100% fake• All examples are simplified• All disasters happened with version 5.7

Disclaimer

10

Page 22: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

The Use Case

Page 23: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• categories• Less than 20 rows

• goods• More than 1M rows• 20 unique cat id values• Many other fields

PriceDate: added, last updated, etc.CharacteristicsStore...

Two Tables

12

Page 24: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• categories• Less than 20 rows

• goods• More than 1M rows• 20 unique cat id values• Many other fields

PriceDate: added, last updated, etc.CharacteristicsStore...

Two Tables

12

Page 25: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

select *

from

goods

join

categories

on

(categories.id=goods.cat_id)

where

date_added between ’2018-07-01’ and ’2018-08-01’

and

cat_id in (16,11)

and

price >= 1000 and <=10000 [ and ... ]

[ GROUP BY ... [ORDER BY ... [ LIMIT ...]]]

;

JOIN

13

Page 26: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Select from the small table

• For each cat id select from the large table• Filter result on date added[ and price[...]]• Slow with many items in the category

Option 1: Select from the Small Table First

14

Page 27: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Select from the small table• For each cat id select from the large table

• Filter result on date added[ and price[...]]• Slow with many items in the category

Option 1: Select from the Small Table First

14

Page 28: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Select from the small table• For each cat id select from the large table• Filter result on date added[ and price[...]]

• Slow with many items in the category

Option 1: Select from the Small Table First

14

Page 29: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Select from the small table• For each cat id select from the large table• Filter result on date added[ and price[...]]• Slow with many items in the category

Option 1: Select from the Small Table First

14

Page 30: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 31: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 32: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 33: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 34: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 35: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 36: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 37: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 1: Illustration

15

Page 38: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Filter rows by date added[ and price[...]]

• Get cat id values• Retrieve rows from the small table• Slow if number of rows, filtered bydate added, is larger than number ofgoods in the selected categories

Option 2: Select From the Large Table First

16

Page 39: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Filter rows by date added[ and price[...]]• Get cat id values

• Retrieve rows from the small table• Slow if number of rows, filtered bydate added, is larger than number ofgoods in the selected categories

Option 2: Select From the Large Table First

16

Page 40: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Filter rows by date added[ and price[...]]• Get cat id values• Retrieve rows from the small table

• Slow if number of rows, filtered bydate added, is larger than number ofgoods in the selected categories

Option 2: Select From the Large Table First

16

Page 41: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Filter rows by date added[ and price[...]]• Get cat id values• Retrieve rows from the small table• Slow if number of rows, filtered bydate added, is larger than number ofgoods in the selected categories

Option 2: Select From the Large Table First

16

Page 42: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 2: Illustration

17

Page 43: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 2: Illustration

17

Page 44: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 2: Illustration

17

Page 45: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 2: Illustration

17

Page 46: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Option 2: Illustration

17

Page 47: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX index everything(cat id, date added[, price[, ...]])• It resolves the issue

• But not in all cases

What if We use Combined Indexes?

18

Page 48: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX index everything(cat id, date added[, price[, ...]])• It resolves the issue• But not in all cases

What if We use Combined Indexes?

18

Page 49: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Maintenance cost• Slower INSERT/UPDATE/DELETE• Disk space

• Index not useful for selecting rows• Tables may have wrong cardinality

The Problem

19

Page 50: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Maintenance cost• Slower INSERT/UPDATE/DELETE• Disk space

• Index not useful for selecting rowsJOIN categories ON (categories.id=goods.cat_id)

JOIN shops ON (shops.id=goods.shop_id)

[ JOIN ... ]

WHERE

date_added between ’2018-07-01’ and ’2018-08-01’

AND

cat_id in (16,11) AND price >= 1000 AND price <=10000 [ AND ... ]

GROUP BY product_type

ORDER BY date_updated DESC

LIMIT 50,100

• Tables may have wrong cardinality

The Problem

19

Page 51: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Maintenance cost• Slower INSERT/UPDATE/DELETE• Disk space

• Index not useful for selecting rows• Tables may have wrong cardinality

The Problem

19

Page 52: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

The Use CaseThe Cardinality: Two Levels

Page 53: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

The Query

Parser

Optimizer

Storage Engine

Data

MySQL Architecture

21

Page 54: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Optimizer• Engine• MyRocks• InnoDB• Any

MySQL is Layered Architecture

22

Page 55: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of unique values in the index• Optimizer uses for the query execution plan

• Example

Cardinality

23

Page 56: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of unique values in the index• Optimizer uses for the query execution plan• Example• ID: 1,2,3,4,5• Number of rows: 5• Cardinality: 5

Cardinality

23

Page 57: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of unique values in the index• Optimizer uses for the query execution plan• Example• Gender: m,f,f,f,f,m,m,m,m,m,m,f,f,m,f,m,f• Number of rows: 17• Cardinality: 2

Cardinality

23

Page 58: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Stores statistics on disk• mysql.innodb table stats• mysql.innodb index stats

• Returns statistics to Optimizer• In ha innobase::info• handler/ha innodb.cc

•When opens table• Subsequent table accesses• flag = HA STATUS VARIABLE• Statistics from memory• Up to date Primary Key data

InnoDB: Overview

24

Page 59: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Stores statistics on disk• Returns statistics to Optimizer

• In ha innobase::info• handler/ha innodb.cc

•When opens table• Subsequent table accesses• flag = HA STATUS VARIABLE• Statistics from memory• Up to date Primary Key data

InnoDB: Overview

24

Page 60: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Stores statistics on disk• Returns statistics to Optimizer• In ha innobase::info• handler/ha innodb.cc

•When opens table• Subsequent table accesses• flag = HA STATUS VARIABLE• Statistics from memory• Up to date Primary Key data

InnoDB: Overview

24

Page 61: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Stores statistics on disk• Returns statistics to Optimizer• In ha innobase::info• handler/ha innodb.cc

•When opens table• flag = HA STATUS CONST• Reads data from disk• Stores it in memory

• Subsequent table accesses• flag = HA STATUS VARIABLE• Statistics from memory• Up to date Primary Key data

InnoDB: Overview

24

Page 62: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Stores statistics on disk• Returns statistics to Optimizer• In ha innobase::info• handler/ha innodb.cc

•When opens table• Subsequent table accesses• flag = HA STATUS VARIABLE• Statistics from memory• Up to date Primary Key data

InnoDB: Overview

24

Page 63: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table created with option STATS AUTO RECALC = 0

• Before ANALYZE TABLEmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 64

...

• After restartmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 2

...

InnoDB: Flow

25

Page 64: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table created with option STATS AUTO RECALC = 0

• After ANALYZE TABLEmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 2

...

• After restartmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 2

...

InnoDB: Flow

25

Page 65: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table created with option STATS AUTO RECALC = 0

• After inserting rowsmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 16

...

• After restartmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 2

...

InnoDB: Flow

25

Page 66: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table created with option STATS AUTO RECALC = 0

• After restartmysql> show index from test\G

...

*************************** 2. row ***************************

Table: test

Non_unique: 1

Key_name: f1

Seq_in_index: 1

Column_name: f1

Collation: A

Cardinality: 2

...

InnoDB: Flow

25

Page 67: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Takes data from the engine

• Class ha statistics• sql/handler.h

• Does not have Cardinality field at all• Uses formula to calculate Cardinality

Optimizer: Overview

26

Page 68: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Takes data from the engine• Class ha statistics• sql/handler.h

• Does not have Cardinality field at all• Uses formula to calculate Cardinality

Optimizer: Overview

26

Page 69: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Takes data from the engine• Class ha statistics• sql/handler.h

• Does not have Cardinality field at all

• Uses formula to calculate Cardinality

Optimizer: Overview

26

Page 70: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Takes data from the engine• Class ha statistics• sql/handler.h

• Does not have Cardinality field at all• Uses formula to calculate Cardinality

Optimizer: Overview

26

Page 71: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• n rows: number of rows in the table• Naturally up to date• Constantly changing!

• rec per key: number of duplicates per key• Calculated by InnoDB in time of ANALYZE• rec per key = n rows / unique values• Do not change!

• Cardinality = n rows / rec per key

Optimizer: Formula

27

Page 72: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• n rows: number of rows in the table• Naturally up to date• Constantly changing!

• rec per key: number of duplicates per key• Calculated by InnoDB in time of ANALYZE• rec per key = n rows / unique values• Do not change!

• Cardinality = n rows / rec per key

Optimizer: Formula

27

Page 73: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• n rows: number of rows in the table• Naturally up to date• Constantly changing!

• rec per key: number of duplicates per key• Calculated by InnoDB in time of ANALYZE• rec per key = n rows / unique values• Do not change!

• Cardinality = n rows / rec per key

Optimizer: Formula

27

Page 74: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Engine stores persistent statisticsInnoDB

Storage TablesStatistics As Calculated

Row Count Only in Memory

• Optimizer calculates Cardinality every timewhen accesses engine statistics•Weak user control

Persistent Statistics Are Not Persistent

28

Page 75: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Engine stores persistent statisticsInnoDB

Storage TablesStatistics As Calculated

Row Count Only in Memory• Optimizer calculates Cardinality every time

when accesses engine statistics

•Weak user control

Persistent Statistics Are Not Persistent

28

Page 76: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Engine stores persistent statisticsInnoDB

Storage TablesStatistics As Calculated

Row Count Only in Memory• Optimizer calculates Cardinality every time

when accesses engine statistics•Weak user control

Persistent Statistics Are Not Persistent

28

Page 77: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

The Use CaseExample

Page 78: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN without histogramsmysql> explain select goods.* from goods

-> join categories on (categories.id=goods.cat_id)

-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)

-> and

-> date_added between ’2000-01-01’ and ’2001-01-01’ -- Large range

-> order by goods.cat_id

-> limit 10\G -- We ask for 10 rows only!

Example

30

Page 79: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN without histograms*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: categories -- Small table first

partitions: NULL

type: index

possible_keys: PRIMARY

key: PRIMARY

key_len: 4

ref: NULL

rows: 20

filtered: 70.00

Extra: Using where; Using index;

Using temporary; Using filesort

Example

30

Page 80: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN without histograms*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: goods -- Large table

partitions: NULL

type: ref

possible_keys: cat_id_2

key: cat_id_2

key_len: 5

ref: orig.categories.id

rows: 51827

filtered: 11.11 -- Default value

Extra: Using where

2 rows in set, 1 warning (0.01 sec)

Example

30

Page 81: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Execution time without histogramsmysql> flush status;

Query OK, 0 rows affected (0.00 sec)

mysql> select goods.* from goods

-> join categories on (categories.id=goods.cat_id)

-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)

-> and

-> date_added between ’2000-01-01’ and ’2001-01-01’

-> order by goods.cat_id

-> limit 10;

ab9f9bb7bc4f357712ec34f067eda364 -

10 rows in set (56.47 sec)

Example

30

Page 82: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Engine statistics without histogramsmysql> show status like ’Handler%’;

+----------------------------+--------+

| Variable_name | Value |

+----------------------------+--------+

...

| Handler_read_next | 964718 |

| Handler_read_prev | 0 |

| Handler_read_rnd | 10 |

| Handler_read_rnd_next | 951671 |

...

| Handler_write | 951670 |

+----------------------------+--------+

18 rows in set (0.01 sec)

Example

30

Page 83: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Now let add the histogrammysql> analyze table goods update histogram on date_added;

+------------+-----------+----------+------------------------------+

| Table | Op | Msg_type | Msg_text |

+------------+-----------+----------+------------------------------+

| orig.goods | histogram | status | Histogram statistics created

for column ’date_added’. |

+------------+-----------+----------+------------------------------+

1 row in set (2.01 sec)

Example

30

Page 84: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN with the histogrammysql> explain select goods.* from goods

-> join categories

-> on (categories.id=goods.cat_id)

-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)

-> and

-> date_added between ’2000-01-01’ and ’2001-01-01’

-> order by goods.cat_id

-> limit 10\G

Example

30

Page 85: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN with the histogram*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: goods -- Large table first

partitions: NULL

type: index

possible_keys: cat_id_2

key: cat_id_2

key_len: 5

ref: NULL

rows: 10 -- Same as we asked

filtered: 98.70 -- True numbers

Extra: Using where

Example

30

Page 86: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• EXPLAIN with the histogram*************************** 2. row ***************************

id: 1

select_type: SIMPLE

table: categories -- Small table

partitions: NULL

type: eq_ref

possible_keys: PRIMARY

key: PRIMARY

key_len: 4

ref: orig.goods.cat_id

rows: 1

filtered: 100.00

Extra: Using index

2 rows in set, 1 warning (0.01 sec)

Example

30

Page 87: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Execution time with the histogrammysql> flush status;

Query OK, 0 rows affected (0.00 sec)

mysql> select goods.* from goods

-> join categories on (categories.id=goods.cat_id)

-> where cat_id in (20,2,18,4,16,6,14,1,12,11,10,9,8,17)

-> and

-> date_added between ’2000-01-01’ and ’2001-01-01’

-> order by goods.cat_id

-> limit 10;

eeb005fae0dd3441c5c380e1d87fee84 -

10 rows in set (0.00 sec) -- 56/0 times faster!

Example

30

Page 88: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Engine statistics with the histogrammysql> show status like ’Handler%’;

+----------------------------+-------++----------------------------+-------+

| Variable_name | Value || Variable_name | Value |

+----------------------------+-------++----------------------------+-------+

| Handler_commit | 1 || Handler_read_prev | 0 |

| Handler_delete | 0 || Handler_read_rnd | 0 |

| Handler_discover | 0 || Handler_read_rnd_next | 0 |

| Handler_external_lock | 4 || Handler_rollback | 0 |

| Handler_mrr_init | 0 || Handler_savepoint | 0 |

| Handler_prepare | 0 || Handler_savepoint_rollback | 0 |

| Handler_read_first | 1 || Handler_update | 0 |

| Handler_read_key | 3 || Handler_write | 0 |

| Handler_read_last | 0 |+----------------------------+-------+

| Handler_read_next | 9 |18 rows in set (0.00 sec)

Example

30

Page 89: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Why the Difference?

Page 90: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 3 4 5 6 7 8 9 100

200

400

600

800

Indexes: Number of Items with Same Value

32

Page 91: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 3 4 5 6 7 8 9 100

200

400

600

800

Indexes: Cardinality

33

Page 92: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 3 4 5 6 7 8 9 100

200

400

600

800

Histograms: Number of Values in Each Bucket

34

Page 93: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

Histograms: Data in the Histogram

35

Page 94: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Even Worse Use Case

Page 95: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Even Worse Use CaseANALYZE TABLE Limitations

Page 96: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• ANALYZE TABLE often• Use large number of STATS SAMPLE PAGES

Solutions in 5.7-

38

Page 97: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Counts number of pages in the table

• Takes STATS SAMPLE PAGES• Counts number of unique values in

secondary index in these pages• Divides number of pages in the table on

number of sample pages and multipliesresult by number of unique values

How ANALYZE TABLE Works with InnoDB?

39

Page 98: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Counts number of pages in the table• Takes STATS SAMPLE PAGES

• Counts number of unique values insecondary index in these pages• Divides number of pages in the table on

number of sample pages and multipliesresult by number of unique values

How ANALYZE TABLE Works with InnoDB?

39

Page 99: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Counts number of pages in the table• Takes STATS SAMPLE PAGES• Counts number of unique values in

secondary index in these pages

• Divides number of pages in the table onnumber of sample pages and multipliesresult by number of unique values

How ANALYZE TABLE Works with InnoDB?

39

Page 100: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Counts number of pages in the table• Takes STATS SAMPLE PAGES• Counts number of unique values in

secondary index in these pages• Divides number of pages in the table on

number of sample pages and multipliesresult by number of unique values

How ANALYZE TABLE Works with InnoDB?

39

Page 101: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of pages in the table: 20,000• STATS SAMPLE PAGES: 20 (default)• Unique values in the secondary index:• In sample pages: 10• In the table: 11

• Cardinality: 20,000 * 10 / 20 = 10,000

Example

40

Page 102: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of pages in the table: 20,000• STATS SAMPLE PAGES: 20 (default)• Unique values in the secondary index:• In sample pages: 10• In the table: 11

• Cardinality: 20,000 * 10 / 20 = 10,000

Example

40

Page 103: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Number of pages in the table: 20,000• STATS SAMPLE PAGES: 5,000• Unique values in the secondary index:• In sample pages: 10• In the table: 11

• Cardinality: 20,000 * 10 / 5,000 = 40

Example 2

41

Page 104: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Time consumingmysql> select count(*) from goods;

+----------+

| count(*) |

+----------+

| 80303000 |

+----------+

1 row in set (35.95 sec)

•With bigger number• 27.13/0.32 = 85 times slower!• Not always a solution

Use Larger STATS SAMPLE PAGES?

42

Page 105: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Time consuming•With default STATS SAMPLE PAGES

mysql> analyze table goods;

+------------+---------+----------+----------+

| Table | Op | Msg_type | Msg_text |

+------------+---------+----------+----------+

| test.goods | analyze | status | OK |

+------------+---------+----------+----------+

1 row in set (0.32 sec)

•With bigger number• 27.13/0.32 = 85 times slower!• Not always a solution

Use Larger STATS SAMPLE PAGES?

42

Page 106: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Time consuming•With bigger number

mysql> alter table goods STATS_SAMPLE_PAGES=5000;

Query OK, 0 rows affected (0.04 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> analyze table goods;

+------------+---------+----------+----------+

| Table | Op | Msg_type | Msg_text |

+------------+---------+----------+----------+

| test.goods | analyze | status | OK |

+------------+---------+----------+----------+

1 row in set (27.13 sec)

• 27.13/0.32 = 85 times slower!• Not always a solution

Use Larger STATS SAMPLE PAGES?

42

Page 107: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Time consuming•With bigger number• 27.13/0.32 = 85 times slower!

• Not always a solution

Use Larger STATS SAMPLE PAGES?

42

Page 108: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Time consuming•With bigger number• 27.13/0.32 = 85 times slower!• Not always a solution

Use Larger STATS SAMPLE PAGES?

42

Page 109: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Even Worse Use CaseExample

Page 110: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• goods characteristicsCREATE TABLE ‘goods_characteristics‘ (

‘id‘ int(11) NOT NULL AUTO_INCREMENT,

‘good_id‘ varchar(30) DEFAULT NULL,

‘size‘ int(11) DEFAULT NULL,

‘manufacturer‘ varchar(30) DEFAULT NULL,

PRIMARY KEY (‘id‘),

KEY ‘good_id‘ (‘good_id‘,‘size‘,‘manufacturer‘),

KEY ‘size‘ (‘size‘,‘manufacturer‘)

) ENGINE=InnoDB AUTO_INCREMENT=196606

DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Two Similar Tables

44

Page 111: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• goods shopsCREATE TABLE ‘goods_shops‘ (

‘id‘ int(11) NOT NULL AUTO_INCREMENT,

‘good_id‘ varchar(30) DEFAULT NULL,

‘location‘ varchar(30) DEFAULT NULL,

‘delivery_options‘ varchar(30) DEFAULT NULL,

PRIMARY KEY (‘id‘),

KEY ‘good_id‘ (‘good_id‘,‘location‘,‘delivery_options‘),

KEY ‘location‘ (‘location‘,‘delivery_options‘)

) ENGINE=InnoDB AUTO_INCREMENT=131071

DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Two Similar Tables

44

Page 112: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Sizemysql> select count(*) from goods_characteristics;

+----------+

| count(*) |

+----------+

| 131072 |

+----------+

1 row in set (0.08 sec)

mysql> select count(*) from goods_shops;

+----------+

| count(*) |

+----------+

| 65536 |

+----------+

1 row in set (0.04 sec)

Two Similar Tables

44

Page 113: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data Distribution: goods characteristicsmysql> select count(*) num_rows, good_id, size

-> from goods_characteristics group by good_id, size;

+----------+---------+------+

| num_rows | good_id | size |

+----------+---------+------+

| 65536 | laptop | 7 | | 8189 | laptop | 13 |

| 8187 | laptop | 8 | | 8191 | laptop | 14 |

| 8190 | laptop | 9 | | 8190 | laptop | 15 |

| 8188 | laptop | 10 | | 10 | laptop | 16 |

| 8192 | laptop | 11 | | 10 | laptop | 17 |

| 8189 | laptop | 12 | +----------+---------+------+

Two Similar Tables

44

Page 114: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data Distribution: goods characteristicsmysql> select count(*) num_rows, good_id, manufacturer

-> from goods_characteristics group by good_id, manufacturer order by num_rows desc;

+----------+---------+--------------+

| num_rows | good_id | manufacturer |

+----------+---------+--------------+

| 65536 | laptop | Noname | | 8189 | laptop | Toshiba |

| 8191 | laptop | Samsung | | 8189 | laptop | Apple |

| 8191 | laptop | Acer | | 8189 | laptop | Asus |

| 8189 | laptop | Dell | | 10 | laptop | Sony |

| 8189 | laptop | HP | | 10 | laptop | Casper |

| 8189 | laptop | Lenovo | +----------+---------+--------------+

Two Similar Tables

44

Page 115: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data Distribution: goods shopsmysql> select count(*) num_rows, good_id, location

-> from goods_shops group by good_id, location order by num_rows desc;

+----------+---------+---------------+

| num_rows | good_id | location |

+----------+---------+---------------+

| 8191 | laptop | New York | | 8189 | laptop | Tokio |

| 8191 | laptop | San Francisco | | 8189 | laptop | Istanbul |

| 8189 | laptop | Paris | | 8189 | laptop | London |

| 8189 | laptop | Berlin | | 10 | laptop | Moscow |

| 8189 | laptop | Brussels | | 10 | laptop | Kiev |

+----------+---------+---------------+

Two Similar Tables

44

Page 116: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data Distribution: goods shopsmysql> select count(*) num_rows, good_id, delivery_options

-> from goods_shops group by good_id, delivery_options order by num_rows desc;

+----------+---------+------------------+

| num_rows | good_id | delivery_options |

+----------+---------+------------------+

| 8192 | laptop | DHL | | 8189 | laptop | Gruzovichkof |

| 8191 | laptop | PTT | | 8188 | laptop | Courier |

| 8190 | laptop | Normal Post | | 8187 | laptop | No delivery |

| 8190 | laptop | Tracked | | 10 | laptop | Premium |

| 8189 | laptop | Fedex | | 10 | laptop | Urgent |

+----------+---------+------------------+

Two Similar Tables

44

Page 117: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Histogram statistics are useful primarily for nonindexed columns. Adding anindex to a column for which histogram statistics are applicable might also helpthe optimizer make row estimates. The tradeoffs are:

An index must be updated when table data is modified.A histogram is created or updated only on demand, so it adds no overhead

when table data is modified. On the other hand, the statistics become progres-sively more out of date when table modifications occur, until the next time theyare updated.

MySQL User Reference Manual

Optimizer Statistics aka Histograms

45

Page 118: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

mysql> alter table goods_characteristics stats_sample_pages=5000;

Query OK, 0 rows affected (0.02 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> alter table goods_shops stats_sample_pages=5000;

Query OK, 0 rows affected (0.05 sec)

Records: 0 Duplicates: 0 Warnings: 0

mysql> analyze table goods_characteristics, goods_shops;

+----------------------------+---------+----------+----------+

| Table | Op | Msg_type | Msg_text |

+----------------------------+---------+----------+----------+

| test.goods_characteristics | analyze | status | OK |

| test.goods_shops | analyze | status | OK |

+----------------------------+---------+----------+----------+

2 rows in set (0.35 sec)

Index Statistics is More than Good

46

Page 119: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• The querymysql> select count(*) from goods_shops join goods_characteristics

-> using (good_id)

-> where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

^C^C -- query aborted

ERROR 1317 (70100): Query execution was interrupted

Performance

47

Page 120: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Handlersmysql> show status like ’Handler%’;

+----------------------------+-------------+

| Variable_name | Value |

+----------------------------+-------------+

| Handler_commit | 0 |

| Handler_delete | 0 |

| Handler_discover | 0 |

| Handler_external_lock | 4 |

| Handler_mrr_init | 0 |

| Handler_prepare | 0 |

| Handler_read_first | 1 |

| Handler_read_key | 13043 |

| Handler_read_last | 0 |

| Handler_read_next | 854,767,916 |

...

Performance

47

Page 121: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table ordermysql> explain select count(*) from goods_shops join goods_characteristics

-> using (good_id) where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

+----+-----------------------+-------+---------+--------+----------+---------------+

| id | table | type | key | rows | filtered | Extra |

+----+-----------------------+-------+---------+--------+----------+---------------+

| 1 | goods_characteristics | index | good_id | 131072 | 25.00 | Using... |

| 1 | goods_shops | ref | good_id | 65536 | 36.00 | Using... |

+----+-----------------------+-------+---------+--------+----------+---------------+

2 rows in set, 1 warning (0.00 sec)

Performance

47

Page 122: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table order mattersmysql> explain select count(*) from goods_shops straight_join goods_characteristics

-> using (good_id) where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

+----+-----------------------+-------+---------+--------+----------+---------------+

| id | table | type | key | rows | filtered | Extra |

+----+-----------------------+-------+---------+--------+----------+---------------+

| 1 | goods_shops | index | good_id | 65536 | 36.00 | Using... |

| 1 | goods_characteristics | ref | good_id | 131072 | 25.00 | Using... |

+----+-----------------------+-------+---------+--------+----------+---------------+

2 rows in set, 1 warning (0.00 sec)

Performance

47

Page 123: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table order mattersmysql> select count(*) from goods_shops straight_join goods_characteristics

-> using (good_id)

-> where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

+----------+

| count(*) |

+----------+

| 816640 |

+----------+

1 row in set (2.11 sec)

Performance

47

Page 124: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Table order mattersmysql> show status like ’Handler_read_next’;

+-------------------+-----------+

| Variable_name | Value |

+-------------------+-----------+

| Handler_read_next | 5,308,416 |

+-------------------+-----------+

1 row in set (0.00 sec)

Performance

47

Page 125: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Not for all datamysql> select count(*) from goods_shops straight_join goods_characteristics

-> using (good_id)

-> where (size > 15 or manufacturer in (’Sony’, ’Casper’))

-> and location in

-> (’New York’, ’San Francisco’, ’Paris’, ’Berlin’, ’Brussels’, ’London’)

-> and delivery_options in

-> (’DHL’,’Normal Post’, ’Tracked’, ’Fedex’, ’No delivery’);

^C^C -- query aborted

ERROR 1317 (70100): Query execution was interrupted

Performance

47

Page 126: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Not for all datamysql> show status like ’Handler%’;

+----------------------------+------------+

| Variable_name | Value |

+----------------------------+------------+

| Handler_commit | 10 |

| Handler_delete | 0 |

| Handler_discover | 0 |

| Handler_external_lock | 28 |

| Handler_mrr_init | 0 |

| Handler_prepare | 0 |

| Handler_read_first | 1 |

| Handler_read_key | 143 |

| Handler_read_last | 0 |

| Handler_read_next | 16,950,265 |

Performance

47

Page 127: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

mysql> analyze table goods_shops update histogram

-> on location, delivery_options;

+-------------+-----------+----------+--------------------------------+

| Table | Op | Msg_type | Msg_text |

+-------------+-----------+----------+--------------------------------+

| goods_shops | histogram | status | Histogram statistics created

for column ’delivery_options’. |

| goods_shops | histogram | status | Histogram statistics created

for column ’location’. |

+-------------+-----------+----------+--------------------------------+

2 rows in set (0.18 sec)

Histograms to The Rescue

48

Page 128: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

mysql> analyze table goods_characteristics update histogram

-> on size, manufacturer ;

+-----------------------+-----------+----------+------------------------------+

| Table | Op | Msg_type | Msg_text |

+-----------------------+-----------+----------+------------------------------+

| goods_characteristics | histogram | status | Histogram statistics created

for column ’manufacturer’. |

| goods_characteristics | histogram | status | Histogram statistics created

for column ’size’. |

+-----------------------+-----------+----------+------------------------------+

2 rows in set (0.23 sec)

Histograms to The Rescue

48

Page 129: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• The querymysql> select count(*) from goods_shops join goods_characteristics

-> using (good_id)

-> where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

+----------+

| count(*) |

+----------+

| 816640 |

+----------+

1 row in set (2.16 sec)

Histograms to The Rescue

48

Page 130: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• The querymysql> show status like ’Handler_read_next’;

+-------------------+-----------+

| Variable_name | Value |

+-------------------+-----------+

| Handler_read_next | 5,308,418 |

+-------------------+-----------+

1 row in set (0.00 sec)

Histograms to The Rescue

48

Page 131: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Filtering effectmysql> explain select count(*) from goods_shops join goods_characteristics

-> using (good_id)

-> where size < 12 and

-> manufacturer in (’Lenovo’, ’Dell’, ’Toshiba’, ’Samsung’, ’Acer’)

-> and (location in (’Moscow’, ’Kiev’) or

-> delivery_options in (’Premium’, ’Urgent’));

+----+-----------------------+-------+---------+--------+----------+----------+

| id | table | type | key | rows | filtered | Extra |

+----+-----------------------+-------+---------+--------+----------+----------+

| 1 | goods_shops | index | good_id | 65536 | 0.06 | Using... |

| 1 | goods_characteristics | ref | good_id | 131072 | 15.63 | Using... |

+----+-----------------------+-------+---------+--------+----------+----------+

2 rows in set, 1 warning (0.00 sec)

Histograms to The Rescue

48

Page 132: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

How Histograms Work?

Page 133: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

↓ sql/sql planner.cc

↓ calculate condition filter↓ Item func *::get filtering effect• get histogram selectivity• Seen as a percent of filtered rows inEXPLAIN

Low Level

50

Page 134: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

↓ sql/sql planner.cc↓ calculate condition filter

↓ Item func *::get filtering effect• get histogram selectivity• Seen as a percent of filtered rows inEXPLAIN

Low Level

50

Page 135: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

↓ sql/sql planner.cc↓ calculate condition filter↓ Item func *::get filtering effect

• get histogram selectivity• Seen as a percent of filtered rows inEXPLAIN

Low Level

50

Page 136: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

↓ sql/sql planner.cc↓ calculate condition filter↓ Item func *::get filtering effect• get histogram selectivity

• Seen as a percent of filtered rows inEXPLAIN

Low Level

50

Page 137: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

↓ sql/sql planner.cc↓ calculate condition filter↓ Item func *::get filtering effect• get histogram selectivity• Seen as a percent of filtered rows inEXPLAIN

Low Level

50

Page 138: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Example datamysql> create table example(f1 int) engine=innodb;

mysql> insert into example values(1),(1),(1),(2),(3);

mysql> select f1, count(f1) from example group by f1;

+------+-----------+

| f1 | count(f1) |

+------+-----------+

| 1 | 3 |

| 2 | 1 |

| 3 | 1 |

+------+-----------+

3 rows in set (0.00 sec)

•With the histogram

Filtered Rows

51

Page 139: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•Without a histogrammysql> explain select * from example where f1 > 0\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 33.33

Extra: Using where

1 row in set, 1 warning (0.00 sec)

•With the histogram

Filtered Rows

51

Page 140: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•Without a histogrammysql> explain select * from example where f1 > 1\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 33.33

Extra: Using where

1 row in set, 1 warning (0.00 sec)

•With the histogram

Filtered Rows

51

Page 141: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•Without a histogrammysql> explain select * from example where f1 > 2\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 33.33

Extra: Using where

1 row in set, 1 warning (0.00 sec)

•With the histogram

Filtered Rows

51

Page 142: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•Without a histogrammysql> explain select * from example where f1 > 3\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 33.33

Extra: Using where

1 row in set, 1 warning (0.00 sec)

•With the histogram

Filtered Rows

51

Page 143: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> analyze table example update histogram on f1 with 3 buckets;

+-----------------+-----------+----------+------------------------------+

| Table | Op | Msg_type | Msg_text |

+-----------------+-----------+----------+------------------------------+

| hist_ex.example | histogram | status | Histogram statistics created

for column ’f1’. |

+-----------------+-----------+----------+------------------------------+

1 row in set (0.03 sec)

Filtered Rows

51

Page 144: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> select * from information_schema.column_statistics

-> where table_name=’example’\G

*************************** 1. row ***************************

SCHEMA_NAME: hist_ex

TABLE_NAME: example

COLUMN_NAME: f1

HISTOGRAM:

"buckets": [[1, 0.6], [2, 0.8], [3, 1.0]],

"data-type": "int", "null-values": 0.0, "collation-id": 8,

"last-updated": "2018-11-07 09:07:19.791470",

"sampling-rate": 1.0, "histogram-type": "singleton",

"number-of-buckets-specified": 3

1 row in set (0.00 sec)

Filtered Rows

51

Page 145: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> explain select * from example where f1 > 0\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 100.00 -- all rows

Extra: Using where

1 row in set, 1 warning (0.00 sec)

Filtered Rows

51

Page 146: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> explain select * from example where f1 > 1\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 40.00 -- 2 rows

Extra: Using where

1 row in set, 1 warning (0.00 sec)

Filtered Rows

51

Page 147: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> explain select * from example where f1 > 2\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 20.00 -- one row

Extra: Using where

1 row in set, 1 warning (0.00 sec)

Filtered Rows

51

Page 148: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

•With the histogrammysql> explain select * from example where f1 > 3\G

*************************** 1. row ***************************

id: 1

select_type: SIMPLE

table: example

partitions: NULL

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: 5

filtered: 20.00 - one row

Extra: Using where

1 row in set, 1 warning (0.00 sec)

Filtered Rows

51

Page 149: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 30

0.5

1

1.5

2

Indexes: Cardinality

52

Page 150: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

1 2 30

0.2

0.4

0.6

0.8

1

Histograms

53

Page 151: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Left Overs

Page 152: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Histograms Indexes

Maintained by Optimizer Storage Engine

Updated On Demand On every DML ∗Storage Light Heavy

Optimizer Uses Real Numbers ∗∗ Cardinality

∗ Unless persistent statistics used∗∗ For up to 1024 buckets

Histograms vs Indexes

55

Page 153: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX• Metadata lock• Can be blocked by any query

• UPDATE HISTOGRAM• Backup lock• Can be locked only by a backup• Can be created any time without fear

Maintenance: Locking

56

Page 154: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX• Metadata lock• Can be blocked by any query

• UPDATE HISTOGRAM• Backup lock• Can be locked only by a backup• Can be created any time without fear

Maintenance: Locking

56

Page 155: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX• Locks writes• Locks reads ∗

PS-2503

Before Percona Server 5.6.38-83.0/5.7.20-18Upstream

• Every DML updates the index

• UPDATE HISTOGRAM• Uses up tohistogram generation max mem size• Persistent after creation• DML do not touch it

Maintenance: Load

57

Page 156: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• CREATE INDEX• Locks writes• Locks reads ∗• Every DML updates the index

• UPDATE HISTOGRAM• Uses up tohistogram generation max mem size• Persistent after creation• DML do not touch it

Maintenance: Load

57

Page 157: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Helps if query plan can be changed• Not a replacement for the index:• GROUP BY• ORDER BY• Query on a single table ∗

Only if filtering effect can change the plan

Histograms

58

Page 158: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Data distribution is uniform• Range optimization can be used• Full table scan is fast

When Histogram are Not Helpful?

59

Page 159: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

• Index statistics collected by the engine• Optimizer calculates Cardinality each time

when it accesses statistics• Indexes don’t always improve performance• Histograms can help

� Still new feature• Histograms do not replace other

optimizations!

Conclusion

60

Page 162: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Rate My Session!

63

Page 163: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Percona’s open source database experts aretrue superheroes, improving databaseperformance for customers across the globe.

Percona’s open source database experts aretrue superheroes, improving databaseperformance for customers across the globe.

Discover what it means to have a Perconacareer with the smartest people in thedatabase performance industries, solving themost challenging problems our customerscome across.

We’re Hiring!

64

Page 164: Billion Goods in Few Categories - Percona · mysql.innodb table stats mysql.innodb index stats Returns statistics to Optimizer In ha innobase::info handler/ha innodb.cc When opens

Thank You