Performance: Observe and Tune

Embed Size (px)

Citation preview

Paul V. NovareseSr. Technical Account Manager11 September 2014

Performance: Observe and Tune

What can we do out of the box?

What is tuned?

Tuning profile delivery mechanism

Red Hat ships tuned profiles that improve performance for many workloads...hopefully yours!

Tuned: Storage Performance Boost

tuned Profile Summary: RHEL6

Tuned: Updates for RHEL7

Installed by default!

Profiles automatically set based on install type:

Desktop/Workstation: balanced

Server/HPC: throughput-performance

Single tuned.conf file

Optional hook/callout capability

Inheritance (cf. httpd.conf)

Profiles updated for RHEL 7 features (obv)

tuned Throughput Profiles: RHEL 7

tuned Latency Profiles: RHEL 7

tuned Virt Profiles: RHEL 7

Let's get our hands dirty...

Tuning Strategies:Bang for your Buck

Problem Statements

Bad

It's slow

Make it go faster

Better

We expect 37 gigaflups/year but we only see 24

We have a bottleneck to a particular LUN in the SAN

Turn bad statements into good statements

Determine victory conditions

Get data

Look at data

Tweak

GOTO 10

Questions to ask

What's actually slow?

How do we know it's slow?

What is the expectation and what is that based on?

What is actually needed to win?

What changed?

How long has it been slow?

Gradual or sudden change?

Are there patterns? (same time every day?)

Can you do something to (temporarily) recover?

What evidence do you have? (sar, iostat, etc?)

Identify bottlenecks

CPU

Memory

IO

Network

Application

Firmware

Basic IO Tuning Strategy

Multiple HBAsInstall (eg) device-mapper-multipath

Default settings in /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.defaults

Understand storage features / limitations Maximum random and sequential read and writes per port

Maximum random and sequential read and writes for the controller

Low level I/O numbersTools to use dd , aiod , aio-stress, IOzone

Run I/O representative of the database implementation

I/O SchedulersCFQ, Deadline, AS, Noop

IO Schedulers

4 tunable I/O SchedulersCFQ elevator=cfq. Completely Fair Queuing default, balanced, fair for multiple luns, adaptors, smp servers

NOOP elevator=noop. No-operation (uses FIFO) in kernel, simple, low cpu overhead, leave opt to ramdisk, raid cntrl etc.

Deadline elevator=deadline. Optimize for run-time-like behavior, low latency per IO, balance issues with large IO luns/controllers. Batches IO ops to produce predictable latencies.

Anticipatory elevator=as. Inserts delays to help stack aggregate IO, best on system w/ limited physical IO SATA

Changing I/O Schedulersecho deadline > /sys/block//queue/scheduler

Append 'elevator=' to end of kernel line

Basic CPU Tuning Strategy

Limit CPU accessOne or more processes can consume all cpu cycles

Completely Fair Scheduler (CFS) in RHEL6 uses scheduler groups to assign different weights to each group

Configure cgroups and set cpu.shares for each group

Manually balance interruptscat /proc/interrupts to see how interrupts are distributed to each cpu

Edit /etc/sysconfig/irqbalance and set IRQBALANCE_BANNED_CPUS=

As an alternative, echo 1 > /proc/irq/142/smp_affinity

Pin processes to a specific CPUTaskset (non-NUMA)

Numactl

Cgroups

Utilize real-time scheduling (nice, MRG)

Basic VM Tuning Strategy

Huge Pages2MB huge page size

Set value in /etc/sysctl.conf (vm.nr_hugepages)

Benefits - https://access.redhat.com/knowledge/solutions/2592

Enabling - https://access.redhat.com/knowledge/solutions/46326

Transparent Huge Pageshttps://access.redhat.com/knowledge/solutions/46111

NUMALocalized memory access for certain workloads improves performance

SwapSet value of vm.swappiness (Default 60) lower number is better for interactive applications and avoids swapping as much as possible

VM Tuning Frequent Fliers

/proc/sys/vm/swappinessShould I swap or drop cache?

/proc/sys/vm/min_free_kbytesBe careful adjusting this! Extremes are bad.

/proc/sys/vm/dirty_ratio

/proc/sys/vm/dirty_background_ratio

/proc/sys/vm/vfs_cache_pressure

80/20 Rule

More like 95/5

At some point our time and effort is best spent elsewhere

What tools can we use?

sariostatperf - Userspace tool to read CPU counters and kernel tracepointsPerformance Co-Pilot (pcp) new in RHEL 7

Divider Slide

Tools

Tradition: start with sar

Built-in

Collects stats for all four major system components (cpu, memory, IO, network)

Data can be easily graphed

Data collection frequency can be easily changed

RHEL 6 sar metadata is different than RHEL 5 - you cannot use RHEL 6 sar to read RHEL 5 sar files.

Collectl

More complex, but more powerful

Can handle NFS, Slab data, and sub-second intervals (i.e. -i .25)

Very low overhead (