Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
John Nahlen
Revised Edition
Copyright © 2011 John Nahlen
OPTIMIZING MYSQL DATABASES
ABOUT ME
• Graduated from Centennial High School, Boise
• Pending member of The National Society of Leadership and Success
• Work at Smith Optics, Ketchum, ID
• In my free time:
• Church
• Programming
• Java
• I host a few open source projects on Google Code
• PHP, MySQL
• Real Time Strategy games
• Business Professionals of America
• Won Regional, State, National Awards
INTRODUCTION TO MY APPLICATION
• Bughouse – a variant of chess
• Two boards, four players
• Timed, fast-paced game
• Games are played on a chess server, http://www.freechess.org
• I store these games on http://www.bughousedb.com
• Stored: games, moves, comments, users
• Not a business – no income from this app except for donations
INTRODUCTION TO MY APPLICATION – THE
PROBLEM
• Note that I am attempting to tackle monster problems
• My passion and resolve stay the same
• Search as much relevant data as possible in the shortest amount of time
• Example: Trying to match 80,000,000+ (and growing) unique strings of a fixed
length using REGEXP BINARY
• Example: Attempting to manage and optimize tables with 125,000,000+ rows
• And I know I’m not the only one!
TERMINOLOGY
• I use the following terms several times during this presentation:
• ACID: Atomicity, Consistency, Isolation, Durability
• Business Rules: Requirements of the application (as defined by project managers
and so forth)
• Index Cardinality: The lower it is, the more data is repetitive
SOME OPENING STATISTICS Production Import-In-Progress
Constantly Growing Permanently Archived
311,767 games 1,334,370 games
27,678,752 moves (5.3GB)
139,269,313 moves (36.3GB)
Windows Home Server 2008 R2
Standard – 64 bit
HP Compaq Desktop
AMD Athlon Dual Core Processor 4450e
2.30GHz
4 GB RAM
Purchase Year: 2010
Windows 7 Home Premium – 64 bit
HP Compaq Laptop
AMD Athlon Dual Core QL-62 2.00GHz
4 GB RAM
Purchase Year: Feb 2009
As of 06/24/2011
STORAGE ENGINES (BASIC OVERVIEW)
• MyISAM
• Well-rounded default engine
• Good for read-heavy applications
• InnoDB
• Great features and ACID compliant
• Transactions
• Foreign keys
• Unmatched CPU efficiency
• Good for write-heavy applications
• Many more configuration options
STORAGE ENGINES (BASIC OVERVIEW)
• MEMORY
• Supports Hash indexing
• Stores data in RAM
• Very fast access (See footnote)
• Indexes also stored in RAM
• If you have the memory to spare
• Limited in size to max_heap_table_size config
• More on this engine in a moment
http://en.wikipedia.org/wiki/List_of_device_bit_rates#Memory_Interconnect.2FRAM_buses
STORAGE ENGINES
• Later in the presentation, I will discuss additional engines
• Not included in the default MySQL build
• Picking the right engine for the application is no easy task
• Ultimately depends on the business rules
MEMORY ENGINE – THE SOLUTION TO ALL YOUR
PROBLEMS?
• Not quite.
• Extremely expensive
• Data is not persistent as RAM is a volatile storage device
• On power outage, shutdown, etc, all data in RAM is lost
• Significantly (100x-1000x) faster than a hard drive
• Memory bandwidth speed limited to the CPU and BUS speeds
• Remember 32-bit OSes can only see around 3 GB memory
FULL TABLE SCANS VS RANGE SCANS
• Full Table Scan
• Used when no indexes are available
• Must iterate through every row in the table
• These are always bad on large tables
• Range Scan
• Primary difference from Full Table Scan is that it uses indexes
• The more filters, the faster it goes
• Example: SELECT * FROM mytable WHERE x = 5 && y = 3
• MySQL will find rows on x = 5 first
• Executes the y = 3 search first if no index on x but there is on y
• MySQL will then execute the next filter
WHAT IS AN INDEX AND WHY USE THEM?
• An index is a row pointer
• Quickly eliminates rows that don’t match the indexed column
• Improves speed of SELECT statements
• Less rows to scan through to match WHERE clause criteria (See Range Scans)
• When an index is not available, a full table scan is required
• Searching on an arbitrary, un-indexed column
• Using functions on non-constants in the WHERE clause will force full table scan
SELECT * FROM mytable WHERE DATE(mydate) > ‘2011-01-01’
SELECT * FROM mytable WHERE mydate > ‘2011-01-01’
INDEXING
• B-Tree is NOT Binary Tree!
• Occurs on the WHERE clause columns
• Multiple-column vs. Single-column indexes
• Null values can be indexed
• Must search on prefix for B-Tree
• Must be constants
INDEXING (TYPES OF)
• Hash Indexing
• Very fast for exact comparisons (=,!=)1
• Supported by few engines
• Think of it as a key/value dictionary
• O(1) lookup
• B-Tree Indexing
• The default index type
• Best for relative comparisons ( < , > )
• Only useful index type on columns in GROUP BY, ORDER BY clauses
• O(log N) lookup
• Fulltext Indexing (MyISAM Only) - Not covered in this presentation
1 http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
VIEWS
• Do not support indexes
• Do not use base table’s indexes
• Performance hits will be significant
• ALTERNATIVE: Storing in tables instead?
• Can be indexed (Good)
• Can use any storage engine (Good)
• May cause data integrity issues
• Depending on how they’re used
• Research Triggers in MySQL (Not covered here)
SUBQUERIES
• Not ideal in most circumstances
• Make sure to use indexes
• Subqueries as derived tables
• Temporary tables to reuse when necessary
• Subqueries vs. self-joins vs. CASE … END/GROUP BY
• Example
• Relevant to my application
• Something to consider
SCHEMA OPTIMIZATION
• Many speed factors rely on database design
• On the contrary, over normalization may have a negative effect
• Will expand this in the future
GENERAL QUERY OPTIMIZATION
• EXPLAIN is your friend (See footnote 1)
• Consider query caching
• Rendered useless (and detrimental) when frequent data changes
• Different from Summary Tables
• Examine wait times for row/table locks for high concurrency applications
• Use the slow query log
• If possible, avoid nullable columns
• Consider fixed table lengths if possible (next page)
1 http://www.cs.ait.ac.th/laboratory/database/manual/manual_MySQL_Optimization.shtml#EXPLAIN
JOIN OPTIMIZATION
• Columns in both tables need to be of the same type and length
• CHAR/VARCHAR considered the same
• Joining TEXT with VARCHAR(n) will not use the index
• Joining (for example) INT(5) with INT(11) will use the index
• Joining (for example) VARCHAR(11) and VARCHAR(15) will not use the index
• Integer joins faster than string joins
• String PRIMARY KEYs are discouraged (eg GUIDs?)
IN CLAUSE OPTIMIZATION
• What is in the IN clause makes a big difference
• Constants are acceptable in the IN clause1
SELECT * FROM mytable WHERE x IN (1,2,3,4)
• Indexed columns:
• The exact opposite is true
• You should avoid IN2 and use OR, UNION instead
SELECT * FROM mytable WHERE y IN (ColA, ColB, ColC, ColD)
SELECT * FROM mytable WHERE y = ColA || y = ColB || y = ColC || y = ColD
1 http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance 2 See #24 on http://forge.mysql.com/wiki/Top10SQLPerformanceTips
FIXED TABLE LENGTH
• Arguments for fixed-table length:
• “Fastest of the on-disk formats”1
• Arguments against fixed-table length:
• http://forums.mysql.com/read.php?21,423433,423846#msg-423846
• http://forums.mysql.com/read.php?21,423433,423982#msg-423982
• Uses more disk space
• Cannot have BLOB or TEXT columns
• MEMORY engine uses this and it cannot be changed
• Choose the ideal column/widths for your application
1 http://dev.mysql.com/doc/refman/5.0/en/static-format.html
SUMMARY TABLES
• A table with usually aggregate values from a much larger table
• Often useful when querying the larger table will take a lot of time
• As these are tables, all storage engines available
• I call these tables “application cache”
• Most often used for generating reports (eg weekly, monthly, etc)
SHRINKING DATA
• Research the myisampack utility
• Use COMPRESS() and related methods
• Foreign keys to other tables (with distinct data) if data has low cardinality
• Only index columns that you plan on using
• Manually shrinking index length for strings1
1 See http://dev.mysql.com/doc/refman/5.0/en/create-index.html
WHY AREN’T MY INDEXES WORKING?
• See if you have a Hash index where you need a B-tree index
• Use B-tree for GROUP BY, ORDER BY clauses
• If table is too small, MySQL may choose to not use indexes
• Columns with date values should almost always be B-Tree
• Are not stored in YYYY-MM-DD HH:MM:SS, only rendered that way
• The entire next slide demonstrates this
• Try using Index Hints
WHY AREN’T MY INDEXES WORKING?
LOCKING
• Table Locking
• Default for most MySQL engines
• Useful when more reads than writes
• One writer thread, many reader threads
• Row Locking
• One reader thread, many writer threads
• Only supported by InnoDB
• Fewer locking conflicts
• Slower if many locks must be acquired
INSERTING ROWS
• LOAD DATA INFILE fastest
• Disabling keys
• Multi-row INSERT
• Locking tables with explicit flush/unlock
• Faster with even two or three inserts
• Indexes don’t have to be flushed to disk until tables unlocked
• Modify the bulk_insert_buffer_size variable in MySQL config file
MYSQL/MYISAM CONFIGURATION
• Prefer 64 bit if you have more than 3 GB RAM
• x86 architectures can only see this much
• Bigger, the better (depending on amount of data and available RAM)
• Key_buffer_size
• Read_buffer_size
• Read_rnd_buffer_size
• Sort_buffer_size
• Size depends on application requirements/business rules
• query_cache_size
• query_cache_type
MYSQL CLUSTER / REPLICATION
• A way to load balance queries over multiple sets of physical hardware
• Could be called a “MySQL Cloud”
• From a performance standpoint, offers significant read speed advantages
• The advantage is biggest in internal networks
• On external networks, remember TCP/IP overhead and bandwidth
• Replication also provides data availability, security, and consistency
TOKUDB™ ENGINE
• Technology called Fractal Tree Indexes™
• Insertions 20x to 80x faster (quoted from website)
• Lays out indexes in a sequential manner
• Indexes don’t fragment
• Sequential I/O is faster on rotational media than random I/O
• This technology was developed at MIT
• Full ACID compliance
• Supports 5x-15x data compression by default
• Numerous additional features
TOKUDB™ ENGINE
• Commercial MySQL Engine
• Has a free version of 50 GB max*
• Only available on Linux for now
• My databases are only located on Windows right now
• See www.tokutek.com
• I would recommend checking this out
* For production uses. Free (no space limit) for evaluation, development purposes, and academic research.
PHYSICAL RECOMMENDATIONS
• The subject of performance of hardware is a key issue
• There is much hardware required to make a computer work
• I’ve thought about numerous possible hardware combinations
• Much too many to list in this presentation
• The possibilities are almost endless
• Databases are only as fast as the media they’re on
• Find the source of the problem (bottleneck)
• This would be better than blindly buying new hardware
PHYSICAL RECOMMENDATIONS
• More resources the better (up to a certain point1)
• Consider MySQL’s available resources to possible usage ratio
• Don’t want to choke out other applications
• In the case of RAM, using too much would cause paging in the OS
• Would result in seriously degraded performance
PHYSICAL RECOMMENDATIONS DISK FRAGMENTATION - INTRODUCTION
• Where pieces of file are scattered across the hard drive
• Takes more rotations to find/read those files
• This is irrelevant for Flash Drives, SSD, or RAM
• Defragmentation makes those one large contiguous file
• Consider purchasing a commercial defragmentation software
• Not applicable to the TokuDB engine
• There are many solutions on the market
• I will discuss one product on the next slide
• You may want to do your own research as well
• Some arguments say negligible compared to creating a more efficient schema
PHYSICAL RECOMMENDATIONS DISK FRAGMENTATION – DISKEEPER 2011
• I have recently purchased Diskeeper Pro Premier 2011
• IFAAST will give a real performance boost for Disk-based databases
• Case study on Diskeeper + Microsoft SQL Server:
http://www.diskeeper.com/defrag/dk-boost-sql-server.aspx
• I no longer trust the Windows built-in defrag software
• This software is only available on Windows
Side Notes:
• Free, fully-functional 30-day trial
• Students with .edu e-mail addresses get fairly nice discounts
MYSQL + SOLID STATE (SSD)
• Solid State Drive (SSD) Basics
• No moving parts, no spin up time
• Most consumer SSDs store data on Flash memory
• Doesn’t require defragmentation
• All performance statistics are important
• Each one serves a different purpose
• For databases with random I/O, look for “IOPS”
• A low-end SSD should still be in the 10,000 IOPS range
• A hard disk runs at around 100-200 IOPS
MYSQL + SOLID STATE (SSD)
• Choose the host interface carefully
• We will discuss more about these speeds in a moment
• See http://en.wikipedia.org/wiki/Solid_state_drives for more information
• Particularly, visit section 4 Comparison of SSD with hard disk drives
• Still fairly expensive
• See section 6.1 Cost and capacity
• Bottom Line
• Hard drives are a thing of the past
• I’ll be buying an SSD when prices come down
PHYSICAL RECOMMENDATIONS NETWORK ATTACHED STORAGE (NAS)
• Internal to the computer would be faster by any means
• NAS is not a good option for speed (performance)
• Even on Gigabit Ethernet backbone, 1 gigabit = 100 megabytes
• OK for smaller databases; you can forget it for anything over 1 GB
• You can just imagine how slow 10/100 networks are
• This will most likely be the bottleneck
• Better for network accessibility and archival purposes
PHYSICAL RECOMMENDATIONS HOST INTERFACE SPEEDS
Device Maximum (Theoretical) Speed
SATA Controller 3.0 600 MB/s (6 Gb/s)
SATA Controller 2.0 300 MB/s (3 Gb/s)
USB 3.0 400 MB/s
USB 2.0 60 MB/s
eSATA 300 MB/s (3 Gb/s)
PCI Express (2.0) Usually >1 GB/s (Depends on BUS Speeds)
1 Gigabit (Gb) = 100 Megabytes (MB) Technically 128 MB/s
http://en.wikipedia.org/wiki/Serial_ATA#Comparison_with_other_buses
WHAT IF MYSQL DOESN’T MEET MY NEEDS?
• Sometimes it won’t
• Variety of possibilities:
• Make it work
• Scale back business requirements
• Look for other options that will better serve your purposes
• Options outside of any RDBMS
• Migrate to another database server
• PostgreSQL, SQL Server, Lucene, MongoDB, etc.
• Write a custom engine for MySQL (Not recommended)
GETTING HELP
• If you need help, you can do several things:
1. Read the MySQL documentation
2. Use a search engine to try to find the answer
3. Ask in a forum or other service (or e-mail me)
• Include your table creation code
• If your question involves query optimization or indexes, post a list of indexes
• Finally, post your queries, questions, and EXPLAIN code (if applicable)
• EXPLAIN [EXTENDED] query
WRAP UP
• MySQL documentation: http://dev.mysql.com/doc/refman/5.5/en/optimization.html
• Helpful information on choosing the right engine: http://www.supportsages.com/blog/2010/08/mysql-storage-engines-an-overview-their-limitations-and-an-attempt-for-comparison/
• Additional BTree information: http://en.wikipedia.org/wiki/B-tree
• Great study guide on MySQL query optimization as a whole: http://www.cs.ait.ac.th/laboratory/database/manual/manual_MySQL_Optimization.shtm
• More MySQL performance tips: http://forge.mysql.com/wiki/Top10SQLPerformanceTips
• MySQL on SSD: http://forums.mysql.com/read.php?123,194317,194317
• Moore’s Law: http://en.wikipedia.org/wiki/Moore%27s_law
• 5.2 Importance of non-CPU bottlenecks
• Interesting forum discussion: http://forums.mysql.com/read.php?21,423433,423433#msg-423433
WRAP UP
• Not entirely comprehensive
• My personal experience over the last two years
• Quick Google search
• I may have made mistakes while composing or presenting this presentation.
• A copy of this presentation will be available at http://www.bughousedb.com/
WRAP UP
• Please provide feedback on given forms (if applicable)
• Did the fact that I used Wikipedia in my sources bother you?
• How knowledgeable was I about the subject?
• How prepared was I for this presentation?
• Did I read off the presentation? If so, how much?
• Was I able to answer all questions?
• Was the PowerPoint easy to read and follow?
• Should I use more graphs, tables, charts, and examples?
• You can, optionally, remain anonymous (just don’t put your name on the forms)
• Any and all feedback would be appreciated. Feel free to be completely honest.
• And by all means, feel free to stay in touch. I enjoy helping.