46
Resolving and Preventing MySQL Downtime Common MySQL service impacting challenges, resolutions and prevention. Jervin Real

Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

  • Upload
    lethuy

  • View
    224

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

Resolving and Preventing MySQL DowntimeCommon MySQL service impacting challenges, resolutions and prevention.

Jervin Real

Page 2: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

2

Jervin Real

• Technical Services Manager – APAC

• Engineer Engineering Engineers

Page 3: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

3

What is Downtime?

Application

Users

BOSS

Page 4: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

4

Why Prevent Downtime?

• Your business loses money when the application is down

• You and your team’s reputation suffers

Page 5: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

5

Agenda

• Real world adventures• Problems

• Solutions

• Prevention

• Putting them all together

Page 6: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

6

I Had a Crash on You

Page 7: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

7

I Had a Crash on You: Page Corruption

Page 8: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

8

I Had a Crash on You: Page Corruption

• Disk bad sectors problem

• No monitoring, checks

• Page corruption on disk level, crashes when reading page from disk

• … and it keeps crashing

Page 9: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

9

I Had a Crash on You: Page Corruption

• Percona Server, we tried:• innodb_table_corrupt_action = salvage

• Worked!

• Dropped table, recreated - application back online

• Worst case:• innodb_force_recovery > 0

• Data Recovery

Page 10: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

10

I Had a Crash on You: Assertion

• Running 5.6.11, early adopter, InnoDB FULLTEXT

• Upgrade to 5.6.18, MySQL crashed

• Data was unusable - bug#72079

Page 11: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

11

I Had a Crash on You: Assertion

• Downgrade and restore from backup

• Re-execute upgrade to avoid the bug

Page 12: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

12

I Had a Crash on You: Assertion

• innodb_corrupt_table_action = salvage|warn

• pt-table-checksum• Regularly recurse your data and check for errors in error log

• RAID card health checks• Can vary by vendor

• SMART checks• Be vigilant for disk level errors

• Plan your upgrades properly

Page 13: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

13

Nobody’s Watching

Page 14: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

14

Nobody’s Watching: Nobody Cared

• Percona XtraDB Cluster, 3 nodes

• Few months ago node 3 went down due to conflict, but nobody noticed

• Few hours ago, node 2 was killed by OOM, cluster lost quorum

• EVERYBODY NOTICED!

Page 15: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

15

Nobody’s Watching: Nobody Cared

• Bootstrap remaining node• mysql> SET GLOBAL wsrep_provider_options=’pc.bootstrap=1’;

• SST second and 3rd node

• Define wsrep_notify_cmd temporarily

• Implement better alerting

Page 16: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

16

Nobody’s Watching: Dropped the Bomb

• New sysadmin received disk space alert

• du -hx --max-depth=1 /

• /var has lots of data

• find /var/ -size +5G -exec rm -rf {} \;

• Bam, ibdata1gone!

• Restart maintenance occurred later in the day ...

Page 17: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

17

Nobody’s Watching: Dropped the Bomb

• Restore from backup

• Really, they were lucky!

• What if there were no backups and innodb_file_per_table = 0?

Page 18: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

18

Nobody’s Watching: Prevention

• Percona Monitoring Plugins• pmp-check-deleted-files

• pmp-check-mysql-status

• pmp-check-mysql-innodb

• Define a script executable by mysql user• Triggered on node state changes

• Take backups, and alert on failure• https://github.com/dotmanila/pyxbackup

Page 19: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

19

Self Induced Pain

Page 20: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

20

Self Induced Pain: Query Cache Lock

• “Waiting for query cache lock”

Page 21: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

21

Self Induced Pain: Query Cache Lock

• Global mutex, point of contention• Moreso on hot dataset/table

• Worse, with large QC

Page 22: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

22

Self Induced Pain: Query Cache Lock

• Set it to small size - to reduce performance overhead

• Disable completely to to avoid contention

• Hint offending queries to skip the query cache i.e. SELECT SQL_NO_CACHE

Page 23: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

23

Self Induced Pain: Buffer Pool Dump/Restore

• Dumps buffer pool page list to disk

• Reloads buffer pool based on this list at startup

• Meant to help speed up buffer pool warmup

Page 24: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

24

Self Induced Pain: Buffer Pool Dump/Restore

• Maintenance restart, buffer dump and restore enabled

• Yey! Expecting everything to go well.

• 30mins in performance still really bad, IO trashing

• Large buffer pool, busy read/write

Page 25: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

25

Self Induced Pain: Buffer Pool Dump/Restore

• Extend your maintenance period to let the server warmup if possible, otherwise they will contend on IO

• RAID1 of 2 SATA disks is not a license to use buffer pool warmup on 240GB of buffer pool

Page 26: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

26

Self Induced Pain: Prevention

• Percona Toolkit• pt-sift

• pt-stalk

• pt-kill

• Optimize for IO

Page 27: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

27

MySQL, MySQL! What Have Suffereth Ye Thee?

Page 28: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

28

MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt

• Slow queries

• Connections build up

• Slow response times

• Long running transactions

• Stop the world scenario

Page 29: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

29

MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt

--innodb--txns: 486xACTIVE (28s) 994xnot (0s) 227xLOCK WAIT (25844s)0 queries inside InnoDB, 0 queries in queueMain thread: sleeping, pending reads 0, writes 28, flush 1Log: lsn = 2147483647, chkp = 2147483647, chkp age = 210625191

Page 30: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

30

MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt

---TRANSACTION 230207990, ACTIVE 13779 sec fetching rowsmysql tables in use 1, locked 180337 lock struct(s), heap size 8271400, 10979242 row lock(s)MySQL thread id 671621, OS thread handle 0x7fe03528a700, query id 37505085 localhost magento Sending data

SELECT `sales_flat_quote_item`.* FROM `sales_flat_quote_item` LIMIT 376 OFFSET 491056

Page 31: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

31

MySQL, MySQL! What Have Suffereth Ye Thee?: Grind to a Halt

• Kill long running trx

• pt-kill for persistent long running trx

• Deploy immediate code changes to disable erring code

Page 32: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

32

MySQL, MySQL! What Have Suffereth Ye Thee?: CPU Load

• MySQL is still responding

• All sorts of mutexes• trx_sys->mutex• block->lock• lock_sys->mutex• lock_sys->wait_mutex

• … and is killing latency

• Service impact means lost income

Page 33: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

33

MySQL, MySQL! What Have Suffereth Ye Thee?: CPU Load

• innodb_thread_concurrency > 0

Page 34: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

34

MySQL, MySQL! What Have Suffereth Ye Thee?: Prevention• pt-kill –log

• Separate your OLTP from analytics if possible

• Proactive analysis on performance and queries

• pt-query-digest (PMM)

• pt-stalk

• MySQL Server Configuration

• Remember to tune innodb_thread_ concurrency (default is 0)

• Innodb_concurrency_tickets , innodb_sync_spin_loops, etc

• Application Stack Configuration (Schema Design)

• Single tenant per schema

• Multiple tenants per schema (each table has client_id column)

• All tenants in one schema

Page 35: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

35

Wizard of OS: Disk Performance

Page 36: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

36

Wizard of OS: Disk Performance

• Disk performance cascading to MySQL to application

Page 37: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

37

Wizard of OS: Disk Performance

• Slow writes, binlogs, redo logs, syncs

• Transactions stalling on COMMIT, updating, inserting …

• Replication getting delayed if node is a slave

• Translates to latency

Page 38: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

38

Wizard of OS: Disk Performance

• RAID Controller in Write-Through

• Could also be bad disk

• Default IO elevator – deadline|noop

• Bad mount options - +noatime

• NFS?

Page 39: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

39

Wizard of OS: Swapping

• Swapping heavily, with significant amount of RAM free

Page 40: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

40

Wizard of OS: Swapping

• Swapping induces significant amount of IO

• Swapping in and out of disk is mighty expensive

• Affects MySQL in magnificent ways

• Swap insanity!

Page 41: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

41

Wizard of OS: Swapping

• NUMA Interleave

• Percona Server is NUMA configurable• numa_interleave

• flush_caches

• Check numastat - perl check_numa.pl

Page 42: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

42

Wizard of OS: Prevention

• Tune• NUMA Policy

• vm.swappiness (always have swap space)

• mount options - noatime

• Disk scheduler/IO Elevator – noop|deadline

• Blog: Linux performance tuning tips for MySQL

• Blog: InnoDB performance optimization basics (redux)

Page 43: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

43

Summary

• Be proactive and analyze your performance regularly• https://www.percona.com/blog/2016/04/18/percona-monitoring-and-management/

• Monitor, monitor wisely

• Test, tune, repeat

• Plan and plan more

Page 44: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

44

Join us at Percona Live Europe

When: October 3-5, 2016Where: Amsterdam, Netherlands

The Percona Live Open Source Database Conference is a great event for users of any level using open source database technologies.

§ Get briefed on the hottest topics

§ Learn about building and maintaining high-performing deployments

§ Listen to technical experts and top industry leaders

Get the advanced registration rate before prices go up on Sep 5th! Register now!

Sponsorship opportunities available as well here.

Page 45: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

45

Questions?

Page 46: Resolving and PreventingMySQL Downtime - Percona · • Technical Services Manager – APAC ... 37505085 localhost magento Sending data ... The Percona Live Open Source Database Conference

DATABASE PERFORMANCE MATTERS