106
#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM Performance Tuning: AWS EC2 and You James Berger SRE Splunk

Performance Tuningfiles.informatandm.com/uploads/2018/10/Performance... · 2018-10-15 · Performance Tuning: AWS EC2 and You James Berger SRE Splunk. #ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Performance Tuning:AWS EC2 and You

James BergerSRE

Splunk

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Brief Intro

Who am I?

What do I do now?

What did I do previously?

How did I get in here?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is your biggest constraint?

CPU? Disk? Memory? Network?

How do you find out if you don't know

already?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is your biggest constraint?

“I don’t know.”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is your biggest constraint?

“I don’t know.”

So how can we determine that?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is your biggest constraint?

“I don’t know.”

So how can we determine that?

We could estimate on the back of an

envelope...

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is your biggest constraint?

“I don’t know.”

So how can we determine that?

We could estimate on the back of an

envelope…

”So it’s a database, I know it’s going to be

memory hungry, so I’ll go with the largest

x1e instance type I can find, I don’t care

how much I spend.”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

No one here has ever been asked if they can decrease their AWS spend.

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

No one.

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

Never.

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

Better:

“OK, back of the envelope math won’t cut it. Let’s do some benchmarking with artificial load similar to what we think we’ll be seeing in the near future.”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

Best:

“OK, artificial load tests aren’t cutting it. Can we test with our actual workload?”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

So what do I do if I already know my biggest bottleneck is CPU / Disk / Memory / Network?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: CPU

CPU bound?

What are some symptoms of this?

“I moved from a C5.xl (4 vCPUs) to a C5.2xl (8 vCPUs) and performance improved!”

“mpstat -P ALL shows %idle below 5% per core”

“CPU utilization is always high and my developers tell me it's not a bug!”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: CPU

Quick note - “So what does an AWS vCPU actually equate to anyway?”

– Not a discrete CPU core

– Deducing the number of actual cores from the number of hyperthreads

– Does it matter?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: CPU

A quick comparison of C5 instance types:

So what's the difference between a c5 and a c5.18xl?

2 vCPUs vs 72 vCPUs

4 GB of RAM vs 144 GB of RAM

“Maybe” 10 Gpbs vs 25 Gbps

$62 a month vs $2,529 a month

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: CPU

What is a “burstable” CPU and why should I care?

– Saving money when you're only CPU bound some of the time

– CPU credit mechanism

– What happens when I run out of CPU credits?

– Monitoring CPU credit usage

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

Disk bound?

What are some symptoms of this?

“I keep running out of disk space”

“My storage isn't fast enough!”

“I need (x) IOPs and I'm not getting them!”

“My iowait is really, really high”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

A quick summary of the available storage types on AWS:

Instance store aka ephemeral aka “I hate my data”

EBS – magnetic, general purpose (SSD) and provisioned IOPs (I hate money)

EFS – Replacing NFS, for when you need to share data across multiple instances

S3 – Larger, slower, cheaper, and why object store based storage is not to be

mistaken for block based storage (S3FS is not your friend)

Glacier – Cheaper S3 storage until you need to read it

Storage Gateway

(You should probably be using AWS EFS unless using it for on-prem)

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

Local, non-redundant storage, either magnetic hard drives, or SSD / NVMe SSD

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

Local, non-redundant storage, either magnetic hard drives, or SSD / NVMe SSD

“That doesn’t sound… So terrible. Why the fuss?”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

“Why shouldn't I trust it? It's so cheap!”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So what is instance store storage anyway and why is it terrible?

“I hate my data and I don't care about it.”

• The perfect use case for instance store!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

S3 looks really cheap!

So this S3FS thing lets me mount a S3 bucket as a local volume! This is awesome!

“Hey, I ran fsck on my S3FS volume and now all of my data is gone, can you help me?”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

So S3 is object based storage and EBS and instance store are block based – how does

that matter in my use case anyway? Why should I care?

Advantages of object based storage

Why object based storage isn't a replacement for block based storage

Is there a good use case for object based storage?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

EBS – incredibly redundant, multiple copies of your data on multiple servers

EBS – Has its own network pipe, doesn't fight the rest of your network traffic

EBS snapshots – my life is saved!

EBS – Larger, faster volumes

EBS elastic volumes (post 2017) – scaling on the fly!

EBS storage types – magnetic, General Purpose (SSD), Provisioned IOPs, Throughput

Optimized and Cold HDD

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

EBS – How redundant is it anyway? How safe is my data?

What does five nines (99.999%) of availability actually equate to?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

EBS – Why should I go with magnetic storage in this day and age?

Previous generation magnetic vs newer Throughput Optimized HDD or Cold HDD

Cost is a big factor for us! IOPs, less so.

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

EBS – So my DBA keeps telling me we're getting choked by a lack of IOPs – are

provisioned IOPs really worth it?

Is money an object?

– No? Provisioned IOPs all day, every day.

– Yes! Time to profile SSD based volume performance and EBS optimized

instance types

– Maybe? Switching volume types on the fly if you have the right instance

type (C1, C3, CC2, CR1, G2, I2, M1, M3, and R3)

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Disk

EBS – Is it worth it to go with a EBS optimized instance type?

“I know exactly how many IOPs I need and I'm not getting them right now.” YES

“I know how many IOPS I need, I'm not getting them and I'm not quite ready to set my

money on fire with provisioned IOPs yet.” YES.

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Memory

Memory bound?

Symptoms?

“I've been all over this app and it's not a memory leak!”

“Performance increases as we increase the amount of available RAM”

“We enabled swapping to survive and it's terrible”

“CPU utilization isn't terribly high”

“The iowait percentage is reasonably low”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Network

Network bound?

Symptoms?

“I've looked at iftop / vnstat / iptraf and it's scary”

“We moved from a C4 to a C5 and stopped dropping packets”

“We've researched moving to a CDN”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Network

Network bound?

Latency vs jitter vs packet loss:

Latency

Jitter

Packet loss

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Network

Network bound?

Can you use a CDN to solve these problems?

When is a CDN the appropriate solution?

– Global vs local user base

– Large amounts of static resources

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload: Network

Network bound?

Enhanced AWS networking and you

Verifying cpu0 is not handling 100% of your network load while the other cores site idle

Intra-VPC bandwith vs AZ to AZ bandwidth vs region to region bandwidth

When you'd like to exceed the speed of light but are unable to

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

What is the worst acceptable performance for your workload?

Who defines this? Users? Internal demands? Workload demands?

What are the consequences for poor performance?

– User base loss– Financial loss– Loss of self respect /s

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Profiling Your Workload

Why go with newer instance types?

Migrating from a m4.xl to a m5.xl

I saved money and performance increased, what now?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

When scaling up the instance isn’t enough:

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

How does auto scaling help with performance?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

Let’s look at a spike in traffic:

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

What would happen if you didn’t have auto scaling?

– Upset customers?

– Lost sales?

– Lost reputation?

Consumers are fickle!

Worst case scenario?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

What would happen if you didn’t have auto scaling?

Performance requires having a service that is available to begin with!

In general, customers can be the biggest impact on performance.

– Too few vs what you estimated and you over-pay for resources

– Too many vs what you estimated and you suffer downtime

– Auto scaling provides a buffer when demand cannot be easily forecast

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

Avoiding churn

• What is churn anyway?

• Why should I care?

Keeping the bills low

• Billing alerts

• A quick note on instance reservations

• A quick note on profiling cost with Cost Explorer

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

Setting appropriate scaling thresholds

– How much work does an instance complete before being

terminated?

– Scale up fast, scale down slow

– Time between launch and the instance doing usable work

– Why maintaining a pool of hot spares is a terrible idea

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Auto Scaling

Handling spikes in load

– Profiling future events based on past performance

– Do you know when your load will spike?

– Strongly coupled or decoupled?

• Decoupling storage from compute

• Can my workload be decoupled?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Selecting An Instance Family

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Selecting An Instance Family

What AWS instance family should I go with?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Selecting An Instance Family

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

Prehistory – Entirely virtualized

History – Paravirtualized

Today – HVM and PV-HVM

Devices and SR-IOV

The Future – KVM replacing Xen

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

Prehistory:

Entirely virtualized – CPU, memory, storage, network

Upside? No longer tied to the hardware

Downside? 100% virtualized hardware is slow!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

History:

Paravirtualized – The guest OS knows the truth!

What are paravirtualized drivers?

What makes them faster?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

Today:

HVM and PV-HVM – CPU support for virtualization

What kind of speed increase?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

Today:

Devices and SR-IOV – PCI-E hardware virtualization support

What is “Single Root I/O Virtualization” anyway?

Faster network adapters

Faster (PCI-E) SSD storage - NVMe

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Virtualization types on AWS

The Future:

KVM replacing Xen?

What's this “Nitro” architecture anyway?

So what's managing the storage and network layers then?

Bare metal instances on AWS? What year is this?!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Calculating efficiency

– Y axis: Cost of compute vCPU per hr

– X axis: Cost of per GB memory per hr

– Ideal solution: Uppermost right corner

– Not one size fits all – “What about network / disk performance?”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Performance to Spend ratio

Performance to Workload completed

Performance per instance?

Performance vs instance type vs instance size

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Performance to Spend ratio

How fast can I get my work done if I don't care about money?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Performance to Workload completed

– The metric you really care about!

– Diminishing returns on performance tuning

– When is “good enough” is actually enough?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Performance per instance?

– Across instances with the same workload, does performance differ?

– What percentage performance difference is enough to be concerned

about?

– What outliers can tell us

– When to investigate vs when to just terminate the instance and move

on?

– ASG churn – beware!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Measuring Performance

Performance vs instance type vs instance size

• Not just scaling out, but scaling up

• Do all workloads respond positively to scaling up?

• Scaling instance sizes down when appropriate

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

What kind of performance increase can I see from kernel tuning?

Quick overview of virtual file systems and kernel space vs user space

How do I tune things?

Reading current values

Setting single values

Setting groups of values – tuned to the rescue

How do I make these changes permanent?

What is context switching and why is it bad?

What things can be tuned?

OK, wow, that's a lot of settings – which should I pay attention to?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

What kind of performance increase can I see from kernel tuning?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

1% to 30%

-Netflix

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Depending on your monthly AWS spend, it might be worth it

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

What is a pseudo file system?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Linux (UNIX) philosophy of “everything is a file”

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Why?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

“The whole point with "everything is a file" is not that you have some

random filename..., but the fact that you can use common

tools to operate on different things.”

-Linus Torvalds

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

“I'm always right. This time I'm just even more right than usual.”

-Also Linus Torvalds

https://www.mail-archive.com/[email protected]/msg83284.html

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

A very convenient way to convey information about the

current state of processes and the kernel itself.

root@here:/proc# cat 4/task/4/statusName: bashState: S (sleeping)Tgid: 4Pid: 4PPid: 3TracerPid: 0Uid: 1000 1000 1000 1000Gid: 1000 1000 1000 1000FDSize: 4Groups:VmPeak: 0 kBVmSize: 16352 kBVmLck: 0 kBVmHWM: 0 kBVmRSS: 3728 kBVmData: 0 kBVmStk: 0 kBVmExe: 956 kBVmLib: 0 kBVmPTE: 0 kBThreads: 1SigQ: 0/0SigPnd: 0000000000000000ShdPnd: 0000000000000000SigBlk: 0000000000000000SigIgn: 0000000000000000SigCgt: 0000000000000000CapInh: 0000000000000000CapPrm: 0000000000000000CapEff: 0000000000000000CapBnd: 0000001fffffffffCpus_allowed: 00000001Cpus_allowed_list: 0Mems_allowed: 1Mems_allowed_list: 0voluntary_ctxt_switches: 150nonvoluntary_ctxt_switches: 545root@here:/proc#

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

How do I read the current value of a setting?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Old fashioned:

root@here:~# cat /proc/sys/vm/swappiness60root@here:~#

root@here:~# sysctl vm.swappinessvm.swappiness = 60root@here:~#

Newer and improved with sysctl:

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

OK, enough already – how do I tune something?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Old fashioned:

root@here:~# cat /proc/sys/vm/swappiness60root@here:~#root@here:~# echo "0" > /proc/sys/vm/swappinessroot@here:~#root@here:~# cat /proc/sys/vm/swappiness0root@here:~#

root@here:~# sysctl -w vm.swappiness=0vm.swappiness = 0root@here:~#

Newer and improved with sysctl:

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Changing a single setting on the fly is fine when experimenting, but how do I

get my changes to survive on reboot?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Making that change permanent

root@here:~# vim /etc/sysctl.conf...

# Do not send ICMP redirects (we are not a router)#net.ipv4.conf.all.send_redirects = 0## Do not accept IP source route packets (we are not a router)#net.ipv4.conf.all.accept_source_route = 0#net.ipv6.conf.all.accept_source_route = 0## Log Martian Packets#net.ipv4.conf.all.log_martians = 1## Minimizing the amount of swappingvm.swappiness = 20vm.dirty_ratio = 80vm.dirty_background_ratio = 5 ...

add changes, save and quit :x...load the sysctl.conf file...root@here:~# sysctl -p

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

So I have a lot of things I want to change, entire groups of things – this is painful!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

So I have a lot of things I want to change, entire groups – this is painful!

Tuned to the rescue!

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Using Tuned profiles

Dynamic tuning? Tell me more!

Note: Tuned settings take priority over all other saved settings

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Dynamic tuning? Tell me more!

Note: Tuned settings take priority over all other saved settings

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

What is context switching and why is it bad?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

So what things can be tuned?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

So what things can be tuned?

root@here:~# sysctl -afs.binfmt_misc.status = enabledfs.binfmt_misc.WSLInterop = enabledfs.binfmt_misc.WSLInterop = interpreter /initfs.binfmt_misc.WSLInterop = flags:fs.binfmt_misc.WSLInterop = offset 0fs.binfmt_misc.WSLInterop = magic 4d5afs.inotify.max_queued_events = 16384fs.inotify.max_user_instances = 128fs.inotify.max_user_watches = 8192kernel.cap_last_cap = 36kernel.domainname = localdomainkernel.hostname = herekernel.keys.root_maxkeys = 1000000kernel.ostype = Linuxkernel.overflowgid = 65534kernel.overflowuid = 65534kernel.ngroups_max = 65536kernel.pid_max = 32768kernel.random.entropy_avail = 4096kernel.random.poolsize = 4096kernel.randomize_va_space = 2kernel.sem = 32000 1024000000 500 32000kernel.threads-max = 32768kernel.shmmax = 4294967295kernel.shmmni = 4096kernel.yama.ptrace_scope = 1...root@here:~#

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

OK. That's a big list! Which ones should I look at in particular?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

CPU tunables

– Scheduler class

– Priorities

– Migration latency

– Tasksets and using them when you start digging

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Memory tunables

– Virtual memory

– Swappiness

– Overcommit

– OOM behavior and why OOM killer really is your friend

– “Huge pages – are they worth it?” Or “Wow, that made it worse”

– NUMA balancing on larger instances (8 or more vCPUs)

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

File system tunables

– Page cache flushing behavior

– Things to tune for a given filesystem

– Vm.dirty ratio and why it matters a lot

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Storage and I/O tunables

– Read ahead size

– In-flight requests

– I/O scheduler – be careful!

– Volume stripe width – (when using magnetic storage)

– Md chunk size and stripe width (magnetic specific)

– Improving SSD performance with the noop scheduler

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Networking

– TCP buffer sizes

– TCP backlog

– Device backlog

– TCP reuse (careful on this one!)

– Net.ipv4 tunables

– Net.ipv6 tunables

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Kernel Tuning

Hypervisor

– Using HVM, not PV, right?

– Kernel clocksource – which is right for your use case?

– Is clock drift an actual problem for me?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

You have a dashboard, right?

What things should I be watching?

What tools can I use on the instances themselves?

What are some good remote monitoring tools?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

You have a dashboard, right?

So that sounds like a lot of effort – how important is it really?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

What things should I be watching?

– Total number of instances

– CPU usage per instance

– Latency spikes

– ASG churn

– Load average

– Network saturation

– Errors

– Blocking I/O

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

You have a dashboard, right?

That sounds like a lot of work. Is it worth it?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

What things should I be watching?

– Total number of instances

– CPU usage

– Latency

– ASG churn

– Load average

– Network saturation

– Errors

– Blocking I/O in general

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

What tools can I use on the instances themselves?

– System Tap – complex but worth it!

– Strace – Sometimes the old ways are the best ways

– Vmstat

– Pidstat

– SAR – Your friend and mine, but don't leave it on 24/7

– Load average (uptime)

– Dmesg

– Mpstat (especially on larger instance types)

– Iostat

– Free

– Top and its relatives – mtop, htop, etc

– Perf

– lsof

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

Monitoring Performance Changes

What are some good remote monitoring tools?

– Splunk – Of course. You have money, right?

– Prometheus – Roll your own and boy does it scale!

– Cloudwatch – Free, painful

– Ye Olde Nagios + Thruk – Reliable!

– ELK stack – You're going to hire someone, right?

#ITDEVCONNECTIONS | ITDEVCONNECTIONS.COM

The End!

Questions?

Email: [email protected]