70
Performance Tuning, Monitoring, Management Getting the Most out of SUSE® Linux Enterprise Server Matthias G. Eckermann Senior Product Manager SUSE Linux Enterprise [email protected]

Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

  • Upload
    novell

  • View
    8.963

  • Download
    5

Embed Size (px)

DESCRIPTION

This session discusses and demonstrates the less-known tools and options in SUSE Linux Enterprise Server 11. These tools, while perhaps less obvious, can be very valuable to the experienced user or administrator.This session will include an overview of:1. Performance tuning2. Kernel resource management3. Built-in monitoring capabilities

Citation preview

Page 1: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Performance Tuning, Monitoring, ManagementGetting the Most out of SUSE® Linux Enterprise Server

Matthias G. EckermannSenior Product ManagerSUSE Linux [email protected]

Page 2: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.2

Agenda

Performance Analysis and Tuning

Kernel Resource Management with Control Groups

Built-in Monitoring Capabilities

Page 3: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Part I:Performance Analysis and Tuning

Page 4: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

General Considerations(Hardware, Configuration,...)

Page 5: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.5

Hardware and Configuration

Ultimately, hardware and its configuration set the upper limits for our tuning efforts.

Are we starting with the best possible (minimum needed) hardware platform and components?

– CPU speed only critical for compute-intense tasks

– RAM (amount and speed) and interconnects do matter

– Bottleneck I/O: network bandwidth, disk,...

Is the hardware configuration appropriate?

The weakest link kills performance!

Page 6: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.6

(Hardware) Configuration

Optimize storage configuration– Optimize distribution of data across controllers/disks– Swap to extra disk– Use RAID with striping

Tune hardware setup (BIOS, EFI,...)– Only enable/proble what you have.– Tune for fast reboot vs. startup checks (if desired)– Carefully review all settings.

Disable unneeded services# rc<SERVICE> stop

Page 7: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Identifying Problems

Page 8: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.8

Where Has My Memory Gone!?

Slab Cache– Structures of much less than one page in size

– Generic slabs of predefined sizes (32, 64) plus slabs for specific data structures

Page Cache– Pages with actual contents of files (or block device)

usually the largest, by far

Buffer Cache– File system metadata

Page 9: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.9

Identifying Problems

Start by finding the bottleneck: I/O, disk, mem,...iostat to identify overloaded drives

– package syssat#iostat -x 1

vmstat for basic system usage # vmstat 1

Slabtop for slab cache use

r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 76804 8268 14996 167396 1 1 36 64 132 197 4 1 92 3 0 0 76804 8268 14996 167396 0 0 0 0 1023 879 3 0 97 0 0 0 76804 8300 14996 167396 0 0 0 0 1158 1134 2 0 98 0

Page 10: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

File Systems

Page 11: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.11

Picking a File System

Pick the right file system for the task

– Indexed metadata

– File sizes

– Number of files

– Workloads (database, mail server,...)

– AccessPaths

– Dump/Restore

Page 12: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.12

SUSE® Linux EnterpriseFilesystemsFeature Ext 3 XFS OCFS 2

•/• N/A [3]Journal internal/external •/• •/• •/• N/AOffline extend/shrink •/• •/• •/•Online extend/shrink •/•

table u. B*-tree B+-tree table B-treeSparse Files • • • • •Tail Packing ○ • ○ ○ •

○ ○ • ○ ••/• •/• •/• •/• •/•

Quotas • • • • •Dump/Restore • ○ • ○ ○

4KiB

Support Status SLES SLES SLES SLE HA

reiserfs btrfsData/Metadata Journaling ○/• ○/• ○/•

•/○○/○ •/○

•/○ •/○ •/○ •/○Inode-Allocation-Map

DefragExtAttr / ACLs

Blocksize defaultmax. Filesystemsize [1] 16 TiB 16 TiB 8 EiB 16 TiB 16 EiBmax. Filesize [1] 2 TiB 1 EiB 8 EiB 1 EiB 16 EiB

Technology Preview

SUSE® Linux Enterprise was the first enterprise Linux distribution to support journaling filesystems and logical volume managers back in 2000. Today, we have customers running XFS and ReiserFS with more than 8TiB in one filesystem, and the SUSE Linux Enterprise engineering team is using our 3 major Linux journaling filesystems for all their servers. We are excited to add the OCFS2 cluster filesystem to the range of supported filesystems in SUSE Linux Enterprise. For large-scale filesystems, for example for file serving (e.g., with with Samba, NFS, etc.), we recommend using XFS. (In this table "+" means "available/supported"; "-" is "unsupported")[1] The maximum file size above can be larger than the filesystem's actual size due to usage of sparse blocks. It should also be noted that unless a filesystem comes with large file support (LFS), the maximum file size on a 32-bit system is 2 GB

(231 bytes). Currently all of our standard filesystems (including ext3 and ReiserFS) have LFS, which gives a maximum file size of 263 bytes in theory. The numbers given in the above tables assume that the filesystems are using 4 KiB block size. When using different block sizes, the results are different, but 4 KiB reflects the most common standard.

[2] In this document: 1024 Bytes = 1 KiB; 1024 KiB = 1 MiB; 1024 MiB = 1 GiB; 1024 GiB = 1 TiB; 1024 TiB = 1 PiB; 1024 PiB = 1 EiB (see also http://physics.nist.gov/cuu/Units/binary.html )[3] Btrf s is a copy -on-write logging-sty le f ile sy stem, so rather than needing to journal changes bef ore writing them in-place, it writes them in a new location, and then links it in. Until the last write, the new changes are not “committed.”

Page 13: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.13

File Systems: ReiserFS

Applications that use many small files

– Mail servers

– NFS servers

– Database servers

or other applications that use synchronous I/O

Page 14: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.14

File Systems: Ext3

Default file system in SUSE® Linux Enterprise 11Best suited for

– Small (<100GiB) file systems

When using Ext3 with many files in one directory, consider enabling btree support (enabled by default in SUSE Linux Enterprise Server 11 SP 1)# mkfs.ext3 -O dir_indexWhen using Ext3 with multiple threads appending to files in the same directory, consider turning preallocation on # mount -o reservation

Page 15: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.15

File Systems: XFS

Best suited for– Medium (>100GiB) to very large file systems (> 1 TiB)

– Large files/many files

– Streaming multimedia (low latencies)

Special features and capabilities– dump/restore

– online filesystem-check

– online-defragmentation

Page 16: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.16

Cluster File System: OCFS2

OCFS2 (Oracle Cluster File System)• Shared access by multiple nodes

– Ensures data integrity in case of a node-failure– Scale-out for data access

• Generic use– POSIX-compliant– Cluster-aware POSIX locking

• Higher throughput– Parallel I/O

• Disaster Tolerance– Integration with data replication for dual node

Page 17: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.17

Filesystems: btrfs

• Integrated Volume Management

• Support for copy on write

• Powerful snapshot capabilities

• Scalability

• Data integrity (checksums)

• Full community support

• Technology preview in SUSE® Linux Enterprise Server 11 SP 1

Page 18: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.18

Barriers

SUSE® Linux Enterprise defaults to maximum data integrity guarantee by enforcing barriers from the file system so that reordering of journal writes cannot happen.This may cost some performance; tunable via mount optionReiserFS

– enable with “barrier=flush” (default)– disable with “barrier=none”

Ext3– enable with “barrier=1” (default)– disable with “barrier=0”

XFS– enable with “barrier”– disable with “nobarrier”

Page 19: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.19

Logging Modes

Journaling file systems offer different modes to write the actual dataFor ReiserFS and Ext3, mount option data=<X>

– data=ordered: use barriers for datano risk exposing old data (default)

– data=writeback: no barriers for datafastest in many workloads

– journal: use journal for datagenerally slow, but can improve mail server workloads

By default, SUSE® Linux Enterprise Server ensures data integrity at the cost of some performance

Page 20: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.20

Dedicated Logging Devices

ReiserFS

mkreiserfs -j /dev/xxx -s 8193 /dev/xxy

reiserfstune –journal-new-device /dev/xxx -s 8193

Ext3

mke2fs -O journal_dev /dev/xxx

mke2fs -j -J device=/dev/xxx,size=8193 /dev/xxy

tune2fs -J device=/dev/xxx,size=8193 /dev/xxy

XFS

mkfs.xfs -l logdev=/dev/xxx,size=10000b /dev/xxy

Page 21: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.21

File System Tuning

Split file systems based on data access patterns

– Keep commit heavy data away from data that does not have to be synchronous

– Keep streaming writes and reads on different spindles than random I/O

Consider disabling atime updates on files and directories

# mount -o noatime,nodiratime

Page 22: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.22

File System Tuning

Optimize directory layout for the file system

– Keep data that will be accessed together in the same subdirectories

– Spread data out into different subdirectories to increase large file concurrency

– Different file systems order directories differently

Page 23: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Block Layer

Page 24: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.24

I/O Scheduler

Flexible, pluggable I/O schedulerSelectable via boot parameter elevator=<X>

– noop– deadline– as (default in mainline kernels)– cfq (default in SUSE® Linux Enterprise)

I/O Scheduler per device – Check

/sys/block/*DEV*/queue/iosched– Set

echo SCHEDNAME > /sys/block/*DEV*/queue/scheduler

Page 25: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.25

I/O Scheduler: Noop

No reordering, just merging

Best for storage with extensive caching and scheduling of its own, such as:

MultiPathing

Activated by boot parameter elevator=noop

Page 26: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.26

Per-request service deadline

– Caps maximum latency per request

– Maintains good disk throughput

Best for disk-intensive database applications

Activated by boot parameter elevator=deadline

I/O Scheduler: Deadline

Page 27: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.27

Similar to “deadline”, but anticipates reads by putting them in front of the queue and delays a few ms after every read

– Maximizes throughput– At the cost of increasing latency

Best for file servers and desktop workloads with single IDE/SATA disks.Default in mainline kernels

Activated by boot parameter elevator=as

I/O Scheduler: Anticipatory

Page 28: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.28

Complete Fair QueuingTreat all competing processes equally by keepinga unique request queue for each and giving equal bandwidth to each queue

– Good compromise between throughput and latency

– Minimal worst case latency on all reads and writes

Suitable for a wide variety of applicationsDefault in SUSE® Linux Enterprise

Activated by boot parameter elevator=cfq

I/O Scheduler: CFQ

Page 29: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.29

Block Layer Tuning

Spreading the load across controllers

– Per-target locking for SCSI

– Software RAID bandwidth

Battery backed caching

Page 30: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.30

Blocker Layer Tunables

Block read ahead buffer/sys/block/<sdX/hdX>/queue/read_ahead_kbDefault is 128. Increase to 512 for fast storage (SCSI disks or RAID)May speed up streaming reads a lot

Number of requests/sys/block/<sdX/hdX>/queue/nr_requestsDefault is 128. Increase to 256 with CFQ scheduler for fast storageIncreases throughput at minor latency expense

Page 31: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Memory Management (VM)

Page 32: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.32

Buffer Flushing

How to write dirty pages to diskThis can be tuned by

– /proc/sys/vm/dirty_ratio (40%)Generator of dirty data starts writeback.

– /proc/sys/vm/dirty_background_ratio (10%)– /proc/sys/vm/dirty_expire_centisecs (3000)

How long may dirty pages remain dirty?– /proc/sys/vm/dirty_writeback_centisecs (500)

How often does bdflush wake up?

Defaults are pretty high which is good for databases(but may result in lots of unreclaimable pagecache) For other workloads (HPC) you may want to lower these

Page 33: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.33

VM: Swapiness

The threshold when processes should be swapped can be tuned via

– /proc/sys/vm/swappiness

Default is 60, which works well if you want to swap out daemons or programs which have not done a lot lately

Higher values will provide more buffer/page cache,lower values will wait longer to swap out idle processes

Page 34: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.34

NUMA (1)

NUMA = Non-uniform Memory ArchitectureSUSE® Linux Enterprise detects and uses NUMA topology and automatically

– Prefers memory that is local to a node;

– Evenly balances system data across nodes;

– Gracefully handle CPU-less nodes; etc.

Also applications can (and should) optimize for NUMA topology!

NUMA

Page 35: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.35

NUMA (2)

The NUMA system can be tuned via `numactl $CMD`;the settings then apply to $CMD and all of its children

– --preferred=255

– --membind=!0-1

– --cpunodebind=2-5

– --physcpubind=2-5

– --localalloc (always allocate from current node)

Node 0 may be the most contended, so avoid it

NUMA

Page 36: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Miscellaneous(Scheduler, Network)

Page 37: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.37

Binding Processes/Interrupts to CPUs

Problem: context switching costsCPU affinity: binding CPUs to a specific process can improve performance

– taskset 0x3 [-p pid] [command]

In this example, 0x3 is a bitmap referring to CPUs 1 and 2; 0x6 would be CPUs 2 and 4.

Bind interrupts to CPUs– cat /proc/interrupts

– echo 0x3 > /proc/irq/0/smp_affinity

– Example: distribute NICs among CPUs.

Page 38: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.38

Network Improvements

Gigabit Ethernet and 10g

– Significant interrupt overhead reduction

– Consider Jumbo Frames (larger 1500 bytes)

– # ifconfig <DEV> mtu 9000

NFS modes

– TCP (default) vs UDP

– NFSv3 (default) vs NFSv4

– rsize=<X>/wsize=<X>- read/write in chunks of <X> bytes- default is 1024, use 8192 for higher throughput

Page 39: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Application Interplay

Page 40: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.40

Async I/O, O_DIRECT

Asynchronous I/O– Specific model for concurrency

– Heavily used by databases

Direct I/O (O_DIRECT) on block devices or files – Databases like to use raw disks. Historically /dev/raw

was used, but O_DIRECT is more performant.

– Files should be preallocated (no holes, no appending); the system falls back to buffered I/O otherwise.

– In both cases: cache pollution benefits

– Not specific to database workloads!

Page 41: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Part II:Kernel Resource Managemet with

Control Groups

Page 42: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.42

Control Groups

• Understanding control groups: An in-depth overview

– What Control Groups is designed to do

– How Control Groups work

• Using control groups in SUSE® Linux Enterprise Server 11

– Understanding the components

Page 43: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Understanding Control GroupsAn In-depth Overview

Page 44: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.44

What Are Control Groups?

Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behavior.

– cgroup is another name for Control Groups

– Partition tasks (processes) into a one or many groups of tree hierarchies

– Associate a set of tasks in a group to a set subsystem parameters

– Subsystems provide the parameters that can be assigned

– Tasks are affected by the assigning parameters

Page 45: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.45

Example of the Capabilities of a cgroup

Consider a large university server with various users - students, professors, system tasks etc. The resource planning for this server could be along the following lines:

CPUsTop cpuset (20%)

/ \

CPUSet1 CPUSet2

| |

(Profs) (Students)

60% 20%

MemoryProfessors = 50%

Students = 30%

System = 20%

Disk I/OProfessors = 50%

Students = 30%

System = 20%

Network I/OWWW browsing = 20%

/ \

Prof (15%) Students (5%)

Network File System (60%)

Others (20%)

Source: /usr/src/linux/Documentation/cgroups/cgroups.txt

Page 46: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.46

Control Group Subsystems

Two types of subsystems• Isolation and special controls

– cpuset, namespace, freezer, device, checkpoint/restart• Resource control

– cpu(scheduler), memory, disk i/o, network

Each subsystem can be mounted independently– mount -t cgroup -o cpu none /cpu– mount -t cgroup -o cpuset none /cpuset

or all at once– mount -t cgroup none /cgroup

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 47: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.47

Cpuset Subsystem (Isolation)

Cpuset is for tying processes to cpu and memory.

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Memory MemoryMemory Memory

ProcessGroup A1

ProcessGroup A2

ProcessGroup B

ProcessGroup A1

ProcessGroup A2

ProcessGroup B

Page 48: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.48

Namespace Subsystem (Isolation)

Namespace is for showing private view of system to

processes in cgroup. Mainly used for OS-level

virtualization. This subsystem itself has no special

functions and just tracks changes in namespace.

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 49: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.49

Freezer Subsystem (Control)

Freezer cgroup is for freezing (stopping) all tasks in a group.

mount -t cgroup none /freezer -o freezer

....put task into /freezer/tasks...

echo FROZEN > /freezer/freezer.state

echo RUNNING > /freezer/freezer.state

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 50: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.50

Device Subsystem (Isolation)

A system administrator can provide a list of devices that can be accessed by processes under cgroup

– Allow/Deny Rule

– Allow/Deny : READ/WRITE/MKNOD

Limits access to device or file system on a device to only tasks in specified cgroup

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 51: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.51

Checkpoint/Restart Subsystem (Control)

• Save all process's status in a cgroup to a dump file, restart it later (or just save state and continue)

• For allowing “saved container” moved between physical machines (as VM can do)

• Dump all process's image to a file

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 52: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.52

CPU Subsystem (Resource Control)

• Share CPU bandwidth between groups by group scheduling function of CFS (the scheduler)

• Mechanically complicated

Share = 1000Share = 2000 Share = 4000

Page 53: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.53

Memory Subsystem(Resource Control)

• For limiting memory usage of user space processes.

• Limit LRU (Least Recently Used) pages

– Anonymous and file cache

• No limits for kernel memory

– Maybe in another subsystem if needed

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdf

Page 54: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.54

Disk I/O Subsystem(Resource Control) (Draft)

Source: http://jp.linuxfoundation.org/jp_uploads/seminar20081119/CgroupMemcgMaster.pdfSource: http://lwn.net/Articles/331857/Source: http://lwn.net/Articles/332839/

Source: http://lkml.org/lkml/2009/6/8/580

• 3 proposals are currently being discussed– dm-ioband, io-throttle, io-controller

• Consensus has not been reached but io-controller seems to taking the lead

– Both dm-ioband and io-throttle suffer from a significant problem: they can defeat the policies (such as I/O priority) being implemented by the I/O scheduler.

– Io-throttle is does bandwidth control at the I/O scheduler level– Designed to work with mainline I/O controllers: CFQ, deadline,

Anticipatory, and no-op but requires significant changes– Currently v4 as of June 8, 2009 and based on

2.6.30-rc8 kernel

Page 55: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.55

Network I/O Subsystem(Resource Control) (Draft)

Source: http://lkml.org/lkml/2008/7/22/361Source: https://lists.linux-foundation.org/pipermail/containers/2008-August/012419.htmlSource: https://lists.linux-foundation.org/pipermail/containers/2008-August/012512.html

• Like the Disk I/O subsystem, it seems the jury is still out on the implementation of this subsystem

• Kernel developers are talking about traffic control

– cgroup_tc - This patch provides a simple resource controller which uses traffic control (tc) features already in the Linux kernel

– Not much discussion on this topic since late 2008

Page 56: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.56

Reading More on cgroups

Remember to install kernel source!!– /usr/src/linux/Documentation/cgroups/cgroup.txt

– /usr/src/linux/Documentation/cpusets.txt

– /usr/src/linux/Documentation/controllers/*

– /usr/src/linux/Documentation/scheduler/sched-design-CFS.txt

– /usr/src/linux/Documentation/kernel-parameters.txt

Additional RPM packages– libcgroup1 - /usr/share/doc/packages/libcgroup1/README*

– cpuset (Alex Tsariounov) - /usr/share/doc/packages/cpuset/cset*.txt

Page 57: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.57

Reading More on cgroups (continued)

Manpages

– man cpuset

– man cset

On the web

– http://lkml.org/lkml/2009/2/9/372

– http://lkml.org/lkml/2009/2/10/140

– http://lkml.org/lkml/2008/1/29/60

– http://kerneltrap.org/mailarchive/linux-kernel/2008/6/18/2161114/thread

Page 58: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Using Control Groups in SUSE® Linux Enterprise Server 11

Page 59: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.59

Preparing SUSE® Linux Enterprise Server 11• Start with patched SLES11 install

• Add the following packages

– libcgroup1

– cpuset

– libcpuset1

– kernel-source (Documentation purposes)

– gcc (Needed to compile the stress tool)

Page 60: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.60

What Subsystems Are Available?

A way to figure this out.– mount -t cgroup none /cgroup– cat /proc/mounts

Current subsystems in SUSE Linux Enterprise Server 11

– rw,freezer,devices,cpuacct,cpu,ns,cpuset– memory – Disabled by default

> Add a kernel parameter - cgroup_enable=memory

Possible future subsystems in SLES 11– Disk and Network subsystem controllers

Page 61: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.61

Generating Load on SUSE® Linux Enterprise Server 11Search the Web for “linux load generator” - Results:• http://devin.com/lookbusy/

• http://www.ibm.com/developerworks/linux/library/l-stress/index.html

– Good article

• http://ltp.sourceforge.net/

– Powerful toolkit for Linux developers

• http://hardware.slashdot.org/article.pl?sid=05/04/06/218233

– Simple scripting examples

• http://weather.ou.edu/~apw/projects/stress/

– Probably best choice

– Available (community driven) also at:

http://software.opensuse.org/search?baseproject=SUSE:SLE-11&p=1&q=stress

Page 62: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.62

Crash Course on CPUSETsThe Hard Way

• Determine the number of CPUs and Memory Nodes– Look at /proc/cpuinfo and /proc/zoneinfo

• Creating the CPUSET hierarchymkdir /dev/cpusetmount -t cpuset cpuset /dev/cpusetcd /dev/cpusetmkdir Charliecd Charlie/bin/echo 2-3 > cpus/bin/echo 1 > mems/bin/echo $$ > tasks# The current shell is now running in cpuset Charlie# The next line should display '/Charlie'cat /proc/self/cpuset

• Removing the CPUSETcat /dev/cpuset/Charlie/tasks (move any remaining tasks!!)rmdir /dev/cpuset/Charlie

Page 63: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.63

Crash Course on CPUSETsThe Easy Way – Thanks to Alex Tsariounov of Novell®

• Determine the number of CPUs and Memory Nodes– cset set --list

• Creating the CPUSET hierarchy– cset set --cpu=2-3 --mem=1 --set=Charlie

• Starting processes in a CPUSET– cset proc --set Charlie --exec -- stress -c 1 &

• Moving existing processes to a CPUSET– cset proc --move --pid PID --toset=Charlie

• List task in a CPUSET– cset proc --list --set Charlie

• Removing a CPUSET– cset set --destroy Charlie

Page 64: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.64

Follow It Up with cgroupsThe Hard way

• Creating the cgroup hierarchymkdir /dev/cgroupmount -t cgroup cgroup /dev/cgroupcd /dev/cgroupmkdir prioritycd prioritycat cpu.shares

• Understanding cpu.shares– 1024 is the default (more in sched-design-CFS.txt) = 50% utilization– 1524 = 60% utilization– 2048 = 67% utilization– 512 = 40% utilization

• Changing cpu.shares– /bin/echo 1024 > cpu.shares

Page 65: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.65

More cgroup Functionality to Learn

The libcgroup1 package• Basic tools in user space to simplify resource

management functionality– uid, gid or exec rules for placement of a task– /etc/init.d/cgconfig – setup cgroup filesystem based on

/etc/cgconfig.conf• UID/GID rules

– Managed in /etc/cgrules.conf by root user• EXEC rules

– Fully managed by a user in a config file in their home directory• Methods used to place task in proper cgroup

– pam_cgroup (at login); cgexec (task start); cgclassify (task move)– User space daemon (cgred in /etc/init.d and /etc/sysconfig)

Page 66: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.66

Linux ContainersLXC

• Build upon CGroups and specific kernel settings;use “lxc-checkconfig” to check compliance

• Fully enabled in SUSE Linux Enterprise Server 11 SP1• Basic Functionality

lxc-execute --name=NAME -- COMMAND• Function Overview

– lxc-start lxc-execute / lxc-stop– lxc-freeze lxc-unfreeze– Monitoring: lxc-ps, lxc-info, lxc-netstat, lxc-monitor– Modifying CGroup parameters: lxc-cgroup

Page 67: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Part III:Built-in Monitoring Capabilities

Page 68: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

© Novell, Inc. All rights reserved.68

Monitoring Overview and Hands-on

• Low Level– smartmontools - Monitor for S.M.A.R.T. Disks and Devices– sensors - Hardware health monitoring for Linux– iptraf - TCP/IP Network Monitor– pcp - Performance Co-Pilot

(system-level performance monitoring)– sysstat - Sar and Iostat Commands for Linux– perfmon– blktrace, ltrace, strace– systemtap - Instrumentation System

• High Level– argus – network auditing tool– nagios

Page 69: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server
Page 70: Perfomance Tuning, Monitoring, Management: Getting the Most from SUSE Linux Enterprise Server

Unpublished Work of Novell, Inc. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.

General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.