45
John Nahlen Revised Edition Copyright © 2011 John Nahlen OPTIMIZING MYSQL DATABASES

Optimizing mysql databases - New Bughouse Database

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimizing mysql databases - New Bughouse Database

John Nahlen

Revised Edition

Copyright © 2011 John Nahlen

OPTIMIZING MYSQL DATABASES

Page 2: Optimizing mysql databases - New Bughouse Database

ABOUT ME

• Graduated from Centennial High School, Boise

• Pending member of The National Society of Leadership and Success

• Work at Smith Optics, Ketchum, ID

• In my free time:

• Church

• Programming

• Java

• I host a few open source projects on Google Code

• PHP, MySQL

• Real Time Strategy games

• Business Professionals of America

• Won Regional, State, National Awards

Page 3: Optimizing mysql databases - New Bughouse Database

INTRODUCTION TO MY APPLICATION

• Bughouse – a variant of chess

• Two boards, four players

• Timed, fast-paced game

• Games are played on a chess server, http://www.freechess.org

• I store these games on http://www.bughousedb.com

• Stored: games, moves, comments, users

• Not a business – no income from this app except for donations

Page 4: Optimizing mysql databases - New Bughouse Database

INTRODUCTION TO MY APPLICATION – THE

PROBLEM

• Note that I am attempting to tackle monster problems

• My passion and resolve stay the same

• Search as much relevant data as possible in the shortest amount of time

• Example: Trying to match 80,000,000+ (and growing) unique strings of a fixed

length using REGEXP BINARY

• Example: Attempting to manage and optimize tables with 125,000,000+ rows

• And I know I’m not the only one!

Page 5: Optimizing mysql databases - New Bughouse Database

TERMINOLOGY

• I use the following terms several times during this presentation:

• ACID: Atomicity, Consistency, Isolation, Durability

• Business Rules: Requirements of the application (as defined by project managers

and so forth)

• Index Cardinality: The lower it is, the more data is repetitive

Page 6: Optimizing mysql databases - New Bughouse Database

SOME OPENING STATISTICS Production Import-In-Progress

Constantly Growing Permanently Archived

311,767 games 1,334,370 games

27,678,752 moves (5.3GB)

139,269,313 moves (36.3GB)

Windows Home Server 2008 R2

Standard – 64 bit

HP Compaq Desktop

AMD Athlon Dual Core Processor 4450e

2.30GHz

4 GB RAM

Purchase Year: 2010

Windows 7 Home Premium – 64 bit

HP Compaq Laptop

AMD Athlon Dual Core QL-62 2.00GHz

4 GB RAM

Purchase Year: Feb 2009

As of 06/24/2011

Page 7: Optimizing mysql databases - New Bughouse Database

STORAGE ENGINES (BASIC OVERVIEW)

• MyISAM

• Well-rounded default engine

• Good for read-heavy applications

• InnoDB

• Great features and ACID compliant

• Transactions

• Foreign keys

• Unmatched CPU efficiency

• Good for write-heavy applications

• Many more configuration options

Page 8: Optimizing mysql databases - New Bughouse Database

STORAGE ENGINES (BASIC OVERVIEW)

• MEMORY

• Supports Hash indexing

• Stores data in RAM

• Very fast access (See footnote)

• Indexes also stored in RAM

• If you have the memory to spare

• Limited in size to max_heap_table_size config

• More on this engine in a moment

http://en.wikipedia.org/wiki/List_of_device_bit_rates#Memory_Interconnect.2FRAM_buses

Page 9: Optimizing mysql databases - New Bughouse Database

STORAGE ENGINES

• Later in the presentation, I will discuss additional engines

• Not included in the default MySQL build

• Picking the right engine for the application is no easy task

• Ultimately depends on the business rules

Page 10: Optimizing mysql databases - New Bughouse Database

MEMORY ENGINE – THE SOLUTION TO ALL YOUR

PROBLEMS?

• Not quite.

• Extremely expensive

• Data is not persistent as RAM is a volatile storage device

• On power outage, shutdown, etc, all data in RAM is lost

• Significantly (100x-1000x) faster than a hard drive

• Memory bandwidth speed limited to the CPU and BUS speeds

• Remember 32-bit OSes can only see around 3 GB memory

Page 11: Optimizing mysql databases - New Bughouse Database

FULL TABLE SCANS VS RANGE SCANS

• Full Table Scan

• Used when no indexes are available

• Must iterate through every row in the table

• These are always bad on large tables

• Range Scan

• Primary difference from Full Table Scan is that it uses indexes

• The more filters, the faster it goes

• Example: SELECT * FROM mytable WHERE x = 5 && y = 3

• MySQL will find rows on x = 5 first

• Executes the y = 3 search first if no index on x but there is on y

• MySQL will then execute the next filter

Page 12: Optimizing mysql databases - New Bughouse Database

WHAT IS AN INDEX AND WHY USE THEM?

• An index is a row pointer

• Quickly eliminates rows that don’t match the indexed column

• Improves speed of SELECT statements

• Less rows to scan through to match WHERE clause criteria (See Range Scans)

• When an index is not available, a full table scan is required

• Searching on an arbitrary, un-indexed column

• Using functions on non-constants in the WHERE clause will force full table scan

SELECT * FROM mytable WHERE DATE(mydate) > ‘2011-01-01’

SELECT * FROM mytable WHERE mydate > ‘2011-01-01’

Page 13: Optimizing mysql databases - New Bughouse Database

INDEXING

• B-Tree is NOT Binary Tree!

• Occurs on the WHERE clause columns

• Multiple-column vs. Single-column indexes

• Null values can be indexed

• Must search on prefix for B-Tree

• Must be constants

Page 14: Optimizing mysql databases - New Bughouse Database

INDEXING (TYPES OF)

• Hash Indexing

• Very fast for exact comparisons (=,!=)1

• Supported by few engines

• Think of it as a key/value dictionary

• O(1) lookup

• B-Tree Indexing

• The default index type

• Best for relative comparisons ( < , > )

• Only useful index type on columns in GROUP BY, ORDER BY clauses

• O(log N) lookup

• Fulltext Indexing (MyISAM Only) - Not covered in this presentation

1 http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html

Page 15: Optimizing mysql databases - New Bughouse Database

VIEWS

• Do not support indexes

• Do not use base table’s indexes

• Performance hits will be significant

• ALTERNATIVE: Storing in tables instead?

• Can be indexed (Good)

• Can use any storage engine (Good)

• May cause data integrity issues

• Depending on how they’re used

• Research Triggers in MySQL (Not covered here)

Page 16: Optimizing mysql databases - New Bughouse Database

SUBQUERIES

• Not ideal in most circumstances

• Make sure to use indexes

• Subqueries as derived tables

• Temporary tables to reuse when necessary

• Subqueries vs. self-joins vs. CASE … END/GROUP BY

• Example

• Relevant to my application

• Something to consider

Page 17: Optimizing mysql databases - New Bughouse Database

SCHEMA OPTIMIZATION

• Many speed factors rely on database design

• On the contrary, over normalization may have a negative effect

• Will expand this in the future

Page 18: Optimizing mysql databases - New Bughouse Database

GENERAL QUERY OPTIMIZATION

• EXPLAIN is your friend (See footnote 1)

• Consider query caching

• Rendered useless (and detrimental) when frequent data changes

• Different from Summary Tables

• Examine wait times for row/table locks for high concurrency applications

• Use the slow query log

• If possible, avoid nullable columns

• Consider fixed table lengths if possible (next page)

1 http://www.cs.ait.ac.th/laboratory/database/manual/manual_MySQL_Optimization.shtml#EXPLAIN

Page 19: Optimizing mysql databases - New Bughouse Database

JOIN OPTIMIZATION

• Columns in both tables need to be of the same type and length

• CHAR/VARCHAR considered the same

• Joining TEXT with VARCHAR(n) will not use the index

• Joining (for example) INT(5) with INT(11) will use the index

• Joining (for example) VARCHAR(11) and VARCHAR(15) will not use the index

• Integer joins faster than string joins

• String PRIMARY KEYs are discouraged (eg GUIDs?)

Page 20: Optimizing mysql databases - New Bughouse Database

IN CLAUSE OPTIMIZATION

• What is in the IN clause makes a big difference

• Constants are acceptable in the IN clause1

SELECT * FROM mytable WHERE x IN (1,2,3,4)

• Indexed columns:

• The exact opposite is true

• You should avoid IN2 and use OR, UNION instead

SELECT * FROM mytable WHERE y IN (ColA, ColB, ColC, ColD)

SELECT * FROM mytable WHERE y = ColA || y = ColB || y = ColC || y = ColD

1 http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance 2 See #24 on http://forge.mysql.com/wiki/Top10SQLPerformanceTips

Page 21: Optimizing mysql databases - New Bughouse Database

FIXED TABLE LENGTH

• Arguments for fixed-table length:

• “Fastest of the on-disk formats”1

• Arguments against fixed-table length:

• http://forums.mysql.com/read.php?21,423433,423846#msg-423846

• http://forums.mysql.com/read.php?21,423433,423982#msg-423982

• Uses more disk space

• Cannot have BLOB or TEXT columns

• MEMORY engine uses this and it cannot be changed

• Choose the ideal column/widths for your application

1 http://dev.mysql.com/doc/refman/5.0/en/static-format.html

Page 22: Optimizing mysql databases - New Bughouse Database

SUMMARY TABLES

• A table with usually aggregate values from a much larger table

• Often useful when querying the larger table will take a lot of time

• As these are tables, all storage engines available

• I call these tables “application cache”

• Most often used for generating reports (eg weekly, monthly, etc)

Page 23: Optimizing mysql databases - New Bughouse Database

SHRINKING DATA

• Research the myisampack utility

• Use COMPRESS() and related methods

• Foreign keys to other tables (with distinct data) if data has low cardinality

• Only index columns that you plan on using

• Manually shrinking index length for strings1

1 See http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Page 24: Optimizing mysql databases - New Bughouse Database

WHY AREN’T MY INDEXES WORKING?

• See if you have a Hash index where you need a B-tree index

• Use B-tree for GROUP BY, ORDER BY clauses

• If table is too small, MySQL may choose to not use indexes

• Columns with date values should almost always be B-Tree

• Are not stored in YYYY-MM-DD HH:MM:SS, only rendered that way

• The entire next slide demonstrates this

• Try using Index Hints

Page 25: Optimizing mysql databases - New Bughouse Database

WHY AREN’T MY INDEXES WORKING?

Page 26: Optimizing mysql databases - New Bughouse Database

LOCKING

• Table Locking

• Default for most MySQL engines

• Useful when more reads than writes

• One writer thread, many reader threads

• Row Locking

• One reader thread, many writer threads

• Only supported by InnoDB

• Fewer locking conflicts

• Slower if many locks must be acquired

Page 27: Optimizing mysql databases - New Bughouse Database

INSERTING ROWS

• LOAD DATA INFILE fastest

• Disabling keys

• Multi-row INSERT

• Locking tables with explicit flush/unlock

• Faster with even two or three inserts

• Indexes don’t have to be flushed to disk until tables unlocked

• Modify the bulk_insert_buffer_size variable in MySQL config file

Page 28: Optimizing mysql databases - New Bughouse Database

MYSQL/MYISAM CONFIGURATION

• Prefer 64 bit if you have more than 3 GB RAM

• x86 architectures can only see this much

• Bigger, the better (depending on amount of data and available RAM)

• Key_buffer_size

• Read_buffer_size

• Read_rnd_buffer_size

• Sort_buffer_size

• Size depends on application requirements/business rules

• query_cache_size

• query_cache_type

Page 29: Optimizing mysql databases - New Bughouse Database

MYSQL CLUSTER / REPLICATION

• A way to load balance queries over multiple sets of physical hardware

• Could be called a “MySQL Cloud”

• From a performance standpoint, offers significant read speed advantages

• The advantage is biggest in internal networks

• On external networks, remember TCP/IP overhead and bandwidth

• Replication also provides data availability, security, and consistency

Page 30: Optimizing mysql databases - New Bughouse Database

TOKUDB™ ENGINE

• Technology called Fractal Tree Indexes™

• Insertions 20x to 80x faster (quoted from website)

• Lays out indexes in a sequential manner

• Indexes don’t fragment

• Sequential I/O is faster on rotational media than random I/O

• This technology was developed at MIT

• Full ACID compliance

• Supports 5x-15x data compression by default

• Numerous additional features

Page 31: Optimizing mysql databases - New Bughouse Database

TOKUDB™ ENGINE

• Commercial MySQL Engine

• Has a free version of 50 GB max*

• Only available on Linux for now

• My databases are only located on Windows right now

• See www.tokutek.com

• I would recommend checking this out

* For production uses. Free (no space limit) for evaluation, development purposes, and academic research.

Page 32: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS

• The subject of performance of hardware is a key issue

• There is much hardware required to make a computer work

• I’ve thought about numerous possible hardware combinations

• Much too many to list in this presentation

• The possibilities are almost endless

• Databases are only as fast as the media they’re on

• Find the source of the problem (bottleneck)

• This would be better than blindly buying new hardware

Page 33: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS

• More resources the better (up to a certain point1)

• Consider MySQL’s available resources to possible usage ratio

• Don’t want to choke out other applications

• In the case of RAM, using too much would cause paging in the OS

• Would result in seriously degraded performance

Page 34: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS DISK FRAGMENTATION - INTRODUCTION

• Where pieces of file are scattered across the hard drive

• Takes more rotations to find/read those files

• This is irrelevant for Flash Drives, SSD, or RAM

• Defragmentation makes those one large contiguous file

• Consider purchasing a commercial defragmentation software

• Not applicable to the TokuDB engine

• There are many solutions on the market

• I will discuss one product on the next slide

• You may want to do your own research as well

• Some arguments say negligible compared to creating a more efficient schema

Page 35: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS DISK FRAGMENTATION – DISKEEPER 2011

• I have recently purchased Diskeeper Pro Premier 2011

• IFAAST will give a real performance boost for Disk-based databases

• Case study on Diskeeper + Microsoft SQL Server:

http://www.diskeeper.com/defrag/dk-boost-sql-server.aspx

• I no longer trust the Windows built-in defrag software

• This software is only available on Windows

Side Notes:

• Free, fully-functional 30-day trial

• Students with .edu e-mail addresses get fairly nice discounts

Page 36: Optimizing mysql databases - New Bughouse Database

MYSQL + SOLID STATE (SSD)

• Solid State Drive (SSD) Basics

• No moving parts, no spin up time

• Most consumer SSDs store data on Flash memory

• Doesn’t require defragmentation

• All performance statistics are important

• Each one serves a different purpose

• For databases with random I/O, look for “IOPS”

• A low-end SSD should still be in the 10,000 IOPS range

• A hard disk runs at around 100-200 IOPS

Page 37: Optimizing mysql databases - New Bughouse Database

MYSQL + SOLID STATE (SSD)

• Choose the host interface carefully

• We will discuss more about these speeds in a moment

• See http://en.wikipedia.org/wiki/Solid_state_drives for more information

• Particularly, visit section 4 Comparison of SSD with hard disk drives

• Still fairly expensive

• See section 6.1 Cost and capacity

• Bottom Line

• Hard drives are a thing of the past

• I’ll be buying an SSD when prices come down

Page 38: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS NETWORK ATTACHED STORAGE (NAS)

• Internal to the computer would be faster by any means

• NAS is not a good option for speed (performance)

• Even on Gigabit Ethernet backbone, 1 gigabit = 100 megabytes

• OK for smaller databases; you can forget it for anything over 1 GB

• You can just imagine how slow 10/100 networks are

• This will most likely be the bottleneck

• Better for network accessibility and archival purposes

Page 39: Optimizing mysql databases - New Bughouse Database

PHYSICAL RECOMMENDATIONS HOST INTERFACE SPEEDS

Device Maximum (Theoretical) Speed

SATA Controller 3.0 600 MB/s (6 Gb/s)

SATA Controller 2.0 300 MB/s (3 Gb/s)

USB 3.0 400 MB/s

USB 2.0 60 MB/s

eSATA 300 MB/s (3 Gb/s)

PCI Express (2.0) Usually >1 GB/s (Depends on BUS Speeds)

1 Gigabit (Gb) = 100 Megabytes (MB) Technically 128 MB/s

http://en.wikipedia.org/wiki/Serial_ATA#Comparison_with_other_buses

Page 40: Optimizing mysql databases - New Bughouse Database

WHAT IF MYSQL DOESN’T MEET MY NEEDS?

• Sometimes it won’t

• Variety of possibilities:

• Make it work

• Scale back business requirements

• Look for other options that will better serve your purposes

• Options outside of any RDBMS

• Migrate to another database server

• PostgreSQL, SQL Server, Lucene, MongoDB, etc.

• Write a custom engine for MySQL (Not recommended)

Page 41: Optimizing mysql databases - New Bughouse Database

GETTING HELP

• If you need help, you can do several things:

1. Read the MySQL documentation

2. Use a search engine to try to find the answer

3. Ask in a forum or other service (or e-mail me)

• Include your table creation code

• If your question involves query optimization or indexes, post a list of indexes

• Finally, post your queries, questions, and EXPLAIN code (if applicable)

• EXPLAIN [EXTENDED] query

Page 42: Optimizing mysql databases - New Bughouse Database

WRAP UP

• MySQL documentation: http://dev.mysql.com/doc/refman/5.5/en/optimization.html

• Helpful information on choosing the right engine: http://www.supportsages.com/blog/2010/08/mysql-storage-engines-an-overview-their-limitations-and-an-attempt-for-comparison/

• Additional BTree information: http://en.wikipedia.org/wiki/B-tree

• Great study guide on MySQL query optimization as a whole: http://www.cs.ait.ac.th/laboratory/database/manual/manual_MySQL_Optimization.shtm

• More MySQL performance tips: http://forge.mysql.com/wiki/Top10SQLPerformanceTips

• MySQL on SSD: http://forums.mysql.com/read.php?123,194317,194317

• Moore’s Law: http://en.wikipedia.org/wiki/Moore%27s_law

• 5.2 Importance of non-CPU bottlenecks

• Interesting forum discussion: http://forums.mysql.com/read.php?21,423433,423433#msg-423433

Page 43: Optimizing mysql databases - New Bughouse Database

WRAP UP

• Not entirely comprehensive

• My personal experience over the last two years

• Quick Google search

• I may have made mistakes while composing or presenting this presentation.

• A copy of this presentation will be available at http://www.bughousedb.com/

Page 44: Optimizing mysql databases - New Bughouse Database

WRAP UP

• Please provide feedback on given forms (if applicable)

• Did the fact that I used Wikipedia in my sources bother you?

• How knowledgeable was I about the subject?

• How prepared was I for this presentation?

• Did I read off the presentation? If so, how much?

• Was I able to answer all questions?

• Was the PowerPoint easy to read and follow?

• Should I use more graphs, tables, charts, and examples?

• You can, optionally, remain anonymous (just don’t put your name on the forms)

• Any and all feedback would be appreciated. Feel free to be completely honest.

• And by all means, feel free to stay in touch. I enjoy helping.

Page 45: Optimizing mysql databases - New Bughouse Database

THANKS FOR ATTENDING!

Any questions?

[email protected]