Upload
lamdan
View
222
Download
0
Embed Size (px)
Citation preview
Tuning Linux for your database
Colin Charles, Team MariaDB, MariaDB [email protected] / [email protected]
http://bytebot.net/blog/ | @bytebot on TwitterOSDC, Berlin, Germany
28 April 2016
1
whoami• Work on MariaDB at MariaDB Corporation
(SkySQL Ab)
• Merged with Monty Program Ab, makers of MariaDB
• Formerly MySQL AB (exit: Sun Microsystems)
• Past lives include Fedora Project (FESCO), OpenOffice.org
• MySQL Community Contributor of the Year Award winner 2014
2
Operating System
• Focus of this talk & tools: Linux
• But distributions make a difference too
• like RHEL6 has tuned but Ubuntu doesn’t
• Distribution versions can make a difference too
• RHEL5: ktune; RHEL6: tuned; RHEL7: tuned throughput-performance
4
Servers
• When you buy servers... do you think in terms of CPU, then memory, then I/O?
• Database servers clearly love I/O more than memory more than CPU
• queries! scans of large tables? index scans with random I/O? BI?
• Durability requires each transaction commit to be flushed to disk - i.e. call fsync()
5
I/O
• What kind of storage do you use?
• SAS or SATA?
• Ethernet (NAS, DRBD)
• SSD
• Fusion IO (NVMFS)
6
Schedulers (I/O Elevators)
• cfq: default, great for slower storage (SATA)
• noop: low latency storage (SSD)
• deadline: multi-process apps
• Most database workloads benefit from the deadline scheduler (goal is after all to minimise seeks, prioritise process I/O)
7
Elevators continued
• Boot time: elevator=deadline
•echo “deadline” > /sys/class/block/sdaN/queue/scheduler
• http://dimitrik.free.fr/blog/archives/2012/01/mysql-performance-linux-io.html
• http://www.mysqlperformanceblog.com/2009/01/30/linux-schedulers-in-tpcc-like-benchmark/
8
Sample elevator results
9
1 thread tps
1 thread ioutil%
8 threads tps
8 threads ioutil%
deadline* 15000 <1% 19100 <1%
deadline 6850 32% 15250 45%
cfq 5880 45% 1240 94%
File systems
• Choices: ext4, XFS
• mount options:
• ext4: rw, nosuid, noatime, data=ordered, nobarrier
• xfs: nobarrier
• Turn off I/O barriers
• http://lwn.net/Articles/283161/
• Separate files (data/logs/undo/etc.) or not?
• Don’t forget flash backed write caches or battery backed write cache on RAID cards (way more fsync/s)
• iostat -dmx <interval> is your friend
10
SSD & Flash• SSD
• much more IOPS than traditional disks
• lower latency
• make sure you have a new RAID controller
• Flash
• more IOPS than SSD
• very low latency
• InnoDB doublewrite buffer -- MariaDB 10.0, Percona Server 5.6 (Fusion IO), MySQL 5.7
11
RAID
• RAID0 - fast writes, reads, no redundancy
• RAID1 - slower writes, fast reads
• RAID5 - slow for random writes, fast for sequential writes, fast reads too but slow recovery
• RAID10 - fast reads & writes
12
LVM
• Why?
• easily expand disk
• snapshot backups
• Why not?
• 2-3% performance penalty
• Snapshot penalties
13
Memory• The more RAM the better -> reduces I/O requirements
• Use ECC RAM
• NUMA - “swap insanity”
• http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
•numactl --interleave=all mysqld &
• Twitter patch/Percona Server 5.6
• numa_interleave option
• innodb_buffer_pool_populate - NUMA decisions to be made when buffer cache is clean
14
Why does MySQL need memory?
• Per-session buffers
• sort buffer, temp tables
• Metadata/locking
• index statistics, table definitions
• Caching data
• decrease reads: faster response time
• decrease random writes: queues write in cache; perform more sequential writes
15
Where is memory used?
• filesystem cache
• binary / relay logs
• MyISAM data relies on filesystem cache
• MySQL cache:
• innodb_buffer_pool_size (pages), MyISAM’s key_buffer_size (indexes)
• Per session buffers
•sort_buffer_size, join_buffer_size, read_buffer_size, tmp_table_size
•free -m / SHOW ENGINE INNODB STATUS
16
Transparent Huge pages
• 2M (4M even) pages vs 4K standard Linux page
•vm.nr_hugepages=8192
• MySQL can benefit from transparent huge pages, MongoDB doesn’t (neither does Redis)
17
Swappiness• Controls how aggressively the system reclaims
“mapped” memory
• default: 60, vm.swappiness=0 is ok (but not for newer kernels!)
• decrease: aggressive reclaiming of unmapped pagecache memory
• increase: aggressive swapping of mapped memory
• 3.5 kernels change behaviour (and this was backported to RHEL 2.6.32-303)
18
CPU
• default is ondemand, you should turn it to performance generally
•echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
• disable powersave mode
• cpufrequtils is where it’s at
• look at BIOS: turbo mode/C-state/OS power control
19
Network
• Tools: tc, dropwatch
• Use gigabit Ethernet (more bandwidth/throughput, less latency)
• Can also trunk/bond for HA
• net.ipv4.tcp_no_metrics_save=1
20
Network II
• ARP filter - prevents ARP flux
•echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
• MTU size at 9,000 - “jumbo frames”
• sar -n DEV
21
Network III• allow more connections to queue up: echo 4096 > /proc/sys/net/ipv4/tcp_max_syn_backlog
• /etc/sysctl.conf - net.ipv4.tcp_max_syn_backlog = 4096
• increase kernel packet buffer: net.core.netdev_max_backlog = 4096
• increase socket connection wait queue: net.core.somaxconn = 4096
• Reduce TCP timeout that comes after closing socket (default 60s): net.ipv4.tcp_fin_timeout = 30
22
Network IV
• Fully synchronous replication, ala, Galera Cluster? (NDBCluster too)
• Fastest as your slowest node
• evs.send_window + evs.user_send_window (something large e.g. 512)
23
Resource Limits• ulimits / /etc/security/limits.conf - they need to be
high/sensible over the defaults
• -f (file size): unlimited
• -t (cpu time): unlimited
• -v (virtmem): unlimited
• -n (open files): 64000
• -m (memsize): unlimited
• -u (processes/threads): 32000
24
Tune that database
• Focus on database design - find slow, crappy queries and fix them (eg. mysql: pt-query-digest)
• Reduce locking or time waiting - always benchmark your application continually
• Don’t forget /etc/my.cnf (and appropriate friends)
25
KVM/Xen/Virtualization
• Most of your raw database performance comes from being hardware-centric
• KVM at device level you can turn off caching
• Native AIO used over threaded nowadays
• VMWare: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008542
26
Containers
• Kubernetes - vitess.io rolls out w/o issue, CoreOS, Docker
• Remember when benchmarking, you have to look at local vs remote benchmarking (networking can add overhead)
• 1 thread (up to 25% difference), but average seems to be 15% loss for r/w workloads; remotely? 32% is possible
27
EC2
• EC2 - instance limitations
• EBS - unpredictable I/O performance, use RAID10 or RAID0
• RDS - similar performance between EBS RAID10 MySQL & RDS
• choice of MySQL 5.7 (and older) or MariaDB Server 10.0 or PostgreSQL
28
Why MariaDB/Percona Server
• Threadpool
• NUMA interleaving
• Numerous performance fixes in XtraDB like fast InnoDB restarts, parallel doublewrite buffer, etc.
• Can run (supported) MariaDB Galera Cluster/Percona XtraDB Cluster
29
Hardware overall
• Test everything
• Sometimes NIC’s are a bad batch or the driver is buggy (RHEL5 + some Broadcom NICs drop packets under high load - bnx2 before 2.6.33)
• Sometimes you get a bad batch of disk
30
Hadoop/Hbase
• running this in EC2 is problematic
• make sure name resolution works
• enable bonding
• networking: hadoop uses big buffers (64MB blocks by default)
• net.ipv4.tcp_rmem = 32768 436600 4194304
• net.ipv4.tcp_wmem = 32768 436600 4194304
• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_cluster-planning-guide/content/ch_hardware-recommendations.html
31
Other databases
• MongoDB: http://docs.mongodb.org/manual/administration/production-notes/
• PostgreSQL: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
• Neo4j: http://docs.neo4j.org/chunked/stable/configuration-linux-notes.html
• Riak: http://docs.basho.com/riak/1.4.0/cookbooks/Linux-Performance-Tuning/
32
SELinux, firewalls, security
• The easy way is to “disable SELinux”
• The reality is you should learn it
• Its just like a firewall that you really should configure
33
Tools• ps
• vmstat 1
• iostat
• top
• free
• sar (sar -n DEV 1)
• gdb
• tcpdump
• strace
• oprofile
• htop
• MySQL only: Percona Toolkit (pt-diskstats, pt-summary, pt-query-digest, etc.)
34
Benchmarking• sysbench
• OLTP test, use tables with 20M rows and 20M transactions, check 1-128 threads/run
• LinkBench
• Yahoo! Cloud Serving Benchmark
• https://github.com/brianfrankcooper/YCSB
• Google’s PerfKit Benchmarker
• https://github.com/GoogleCloudPlatform/PerfKitBenchmarker
35
Resources
• http://doc.opensuse.org/products/draft/SLES/SLES-tuning_sd_draft/book.sle.tuning.html
• https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Performance_Tuning_Guide/index.html
• https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html
37
A word from your sponsors• MariaDB Corporation sponsored my T&E to
come to OSDC
• Remember you can buy a MariaDB Enterprise subscription to help you with your cloud deployments, MariaDB Server usage, etc.
• Services include: 24x7 support, consulting, training, remote DBA services, non-recurring engineering and more for all variants of MySQL
Thanks/Q&AColin Charles, [email protected] | [email protected]
http://bytebot.net/blog/ | @bytebotslides: slideshare.net/bytebot/
Download MariaDB and give it a try: http://mariadb.org/
39