Upload
ibm-india-smarter-computing
View
728
Download
11
Tags:
Embed Size (px)
DESCRIPTION
Learn about Systems monitoring,How to dump a Linux on System z, Real Customer cases, etc.For more information, visit http://ibm.co/PNo9Cb.
Citation preview
© 2011 IBM Corporation1
Problem Reporting and Analysis Linux on System z -How to survive a Linux Critical Situation
Sven SchuetzLinux on System z Development and [email protected]
© 2011 IBM Corporation2
IBM Live Virtual Class – Linux on System z
Agenda
Introduction How to help us to help you Systems monitoring How to dump a Linux on System z Real Customer cases
© 2011 IBM Corporation3
IBM Live Virtual Class – Linux on System z
Introductory remarks
Problem analysis looks straight forward on the charts but it might have taken weeks to get it done.
A problem does not necessarily show up on the place of origin
The more information is available, the sooner the problem can be solved, because gathering and submitting additional information again and again usually introduces delays.
This presentation can only introduce some tools and how the tools can be used, comprehensive documentation on their capabilities is to be found in the documentation of the corresponding tool.
Do not forget to update your systems
© 2011 IBM Corporation4
IBM Live Virtual Class – Linux on System z
Describe the problem
Get as much information as possible about the circumstances:– What is the problem?
– When did it happen? (date and time, important to dig into logs )
– Where did it happen? One or more systems, production or test environment?
– Is this a first time occurrence?
– If occurred before: how frequently does it occur?
– Is there any pattern?
– Was anything changed recently?
– Is the problem reproducible?
Write down as much information as possible about the problem!
© 2011 IBM Corporation5
IBM Live Virtual Class – Linux on System z
Describe the environment
Machine Setup–Machine type (z196, z10, z9, ...)–Storage Server (ESS800, DS8000, other vendors models)–Storage attachment (FICON, ESCON, FCP, how many channels)–Network (OSA (type, mode), Hipersocket) ...
Infrastructure setup–Clients–Other Computer Systems–Network topologies–Disk configuration
Middleware setup–Databases, web servers, SAP, TSM, (including version information)
© 2011 IBM Corporation6
IBM Live Virtual Class – Linux on System z
Trouble Shooting First-Aid Kit (1/2)
Install packages required for debugging– s390-tools/s390-utils
• dbginfo.sh
– sysstat• sadc/sar
• iostat
– procps• vmstat, top, ps
– net-tools• netstat
– dump tools crash / lcrash• lcrash (lkcdutils) available with SLES9 and SLES10
• crash available on SLES11
• crash in all RHEL distributions
© 2011 IBM Corporation7
IBM Live Virtual Class – Linux on System z
Trouble Shooting First-Aid Kit (2/2)
Collect dbginfo.sh output– Proactively in healthy system
– When problems occur – then compare with healthy system
Collect system data– Always archive syslog (/var/log/messages)
– Start sadc (System Activity Data Collection) service when appropriate (please include disk statistics)
– Collect z/VM MONWRITE Data if running under z/VM when appropriate
When System hangs– Take a dump
• Include System.map, Kerntypes (if available) and vmlinux file
– See “Using the dump tools” book onhttp://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26ddt02.pdf
Enable extended tracing in /sys/kernel/debug/s390dbf for subsystem
© 2011 IBM Corporation8
IBM Live Virtual Class – Linux on System z
dbginfo Script (1/2)
dbginfo.sh is a script to collect various system related files, for debugging purposes. It generates a tar-archive which can be attached to PMRs / Bugzilla entries
part of the s390-tools package in SUSE and recent Red Hat distributions– dbginfo.sh gets continuously improved by service and development
Can be downloaded at the developerWorks website directly
http://www.ibm.com/developerworks/linux/linux390/s390-tools.html
It is similar to the RedHat tool sosreport or supportconfig from Novell
root@larsson:~> dbginfo.sh Create target directory /tmp/DBGINFO-2011-01-15-22-06-20-t6345057Change to target directory /tmp/DBGINFO-2011-01-15-22-06-20-t6345057[...]
© 2011 IBM Corporation9
IBM Live Virtual Class – Linux on System z
dbginfo Script (2/2)
Linux Information:– /proc/[version, cpu, meminfo, slabinfo, modules, partitions, devices ...]
– System z specific device driver information: /proc/s390dbf (RHEL 4 only) or /sys/kernel/debug/s390dbf
– Kernel messages /var/log/messages
– Reads configuration files in directory /etc/ [ccwgroup.conf, modules.conf, fstab]
– Uses several commands: ps, dmesg
– Query setup scripts
• lscss, lsdasd, lsqeth, lszfcp, lstape
– And much more
z/VM information:
– Release and service Level: q cplevel
– Network setup: q [lan, nic, vswitch, v osa]
– Storage setup: q [set, v dasd, v fcp, q pav ...]
– Configuration/memory setup: q [stor, v stor, xstore, cpus...]
– When the system runs as z/VM guest, ensure that the guest has the appropriate privilege class authorities to issue the commands
© 2011 IBM Corporation10
IBM Live Virtual Class – Linux on System z
SADC/SAR
Capture Linux performance data with sadc/sar – CPU utilization
–Disk I/O overview and on device level–Network I/O and errors on device level–Memory usage/Swapping–… and much more–Reports statistics data over time and creates average values for
each item SADC example (for more see man sadc)
– System Activity Data Collector (sadc) --> data gatherer
– /usr/lib64/sa/sadc [options] [interval [count]] [binary outfile]
– /usr/lib64/sa/sadc 10 20 sadc_outfile
© 2011 IBM Corporation11
IBM Live Virtual Class – Linux on System z
SADC/SAR (cont'd)
– /usr/lib64/sa/sadc -d 10 sadc_outfile
– -d option: statistics for disk
– Should be started as a service during system start
✱ SAR example (for more see man sar)
– System Activity Report (sar) command --> reporting tool
– sar -A
– -A option: reports all the collected statistics
– sar -A -f sadc_outfile >sar_outfile
Please include the binary sadc data and sar -A output when submitting SADC/SAR information to IBM support
© 2011 IBM Corporation12
IBM Live Virtual Class – Linux on System z
CPU utilization
Per CPU values:watch out for
system time (kernel time)iowait time (slow I/O subsystem)steal time (time taken by other guests)
© 2011 IBM Corporation13
IBM Live Virtual Class – Linux on System z
Disk I/O rates
read/write operations- per I/O device- tps: transactions- rd/wr_secs: sectorsis your I/O balanced?Maybe you should stripe your LVs
© 2011 IBM Corporation14
IBM Live Virtual Class – Linux on System z
Linux on System z dump tools
DASD dump tool– Writes dump directly on DASD partition
– Uses s390 standalone dump format
– ECKD and FBA DASDs supported
– Single volume and multiple volume (for large systems) dump possible
– Works in z/VM and in LPAR
SCSI dump tool– Writes dump into filesystem
– Uses lckd dump format
– Works in z/VM and in LPAR
VMDUMP– Writes dump to vm spool space (VM reader)
– z/VM specific dump format, dump must be converted
– Only available when running under z/VM
Tape dump tool– Writes dump directly on ESCON/FICON Tape device
– Uses s390 standalone dump format
© 2011 IBM Corporation15
IBM Live Virtual Class – Linux on System z
DASD dump tool – general usage
1. Format and partition dump device
2. Prepare dump device in Linux
3. Stop all CPUs
4. Store Status
5. IPL dump device
6. Copy dump to Linux
root@larsson:~> zipl d /dev/dasd<x1>
root@larsson:~> zgetdump /dev/<x1> > dump_file
root@larsson:~> dasdfmt f /dev/dasd<x> b 4096
root@larsson:~> fdasd /dev/dasd<x>
© 2011 IBM Corporation16
IBM Live Virtual Class – Linux on System z
DASD dump under z/VM
Prepare dump device under Linux, if possible on 64Bit environment:
After Linux crash issue these commands on 3270 console:
Wait until dump is saved on device:
Only disabled wait PSW on older Distributions Attach dump device to a linux system with dump tools installed Store dump to linux file system from dump device (e.g. zgetdump)
root@larsson:~> zipl d /dev/dasd<x1>
#cp cpu all stop#cp cpu 0 store status#cp i <dasd_devno>
00: zIPL v1.6.0 dump tool (64 bit)00: Dumping 64 bit OS00: 00000087 / 00000700 MB 0...00: Dump successful
© 2011 IBM Corporation17
IBM Live Virtual Class – Linux on System z
DASD dump on LPAR (1/2)
© 2011 IBM Corporation18
IBM Live Virtual Class – Linux on System z
DASD dump on LPAR (2/2)
© 2011 IBM Corporation19
IBM Live Virtual Class – Linux on System z
Multi volume dump
zipl can now dump to multiple DASDs. It is now possible to dump system images, which are larger than a single DASD.
–You can specify up to 32 ECKD DASD partitions for a multi-volume dump
Obtain messages, which have not been written to the syslog due to a crash
What are dumps good for?
–Full snapshot of system state taken at any point in time (e.g. after a system has crashed, of or a running system)
–Can be used to analyse system state beyond messages written to the syslog
–Internal data structures not exported to anywhere
© 2011 IBM Corporation20
IBM Live Virtual Class – Linux on System z
Multi volume dump (cont'd)
How to prepare a set of ECKD DASD devices for a multi-volume dump? (64-bit systems only)
–We use two DASDs in this example:
–Create the partitions with fdasd. The sum of the partition sizes must be sufficiently large (the memory size + 10 MB):
–Create a file called sample_dump_conf containing the device nodes (e.g. /dev/dasdc1) of the two partitions, separated by one or more line feed characters
–Prepare the volumes using the zipl command.
root@larsson:~> dasdfmt f /dev/dasdc b 4096 root@larsson:~> dasdfmt f /dev/dasdd b 4096
root@larsson:~> fdasd /dev/dasdc root@larsson:~> fdasd /dev/dasdd
root@larsson:~> zipl M sample_dump_conf [...]
© 2011 IBM Corporation21
IBM Live Virtual Class – Linux on System z
Multi volume dump (cont'd)
To obtain a dump with the multi-volume DASD dump tool, perform the following steps:–Stop all CPUs, Store status on the IPL CPU.– IPL the dump tool using one of the prepared volumes, either 4711 or
4712.–After the dump tool is IPLed, you'll see a messages that indicates the
progress of the dump. Then you can IPL Linux again
Copying a multi-volume dump to a file–Use zgetdump without any option to copy the dump parts to a file:
#cp cpu all stop#cp cpu 0 store status#cp ipl 4711
root@larsson:~> zgetdump /dev/dasdc > mv_dump_file
© 2011 IBM Corporation22
IBM Live Virtual Class – Linux on System z
Multi volume dump (cont'd)
Display information of the involved volumes:
Display information about the dump itself:
root@larsson:~> zgetdump d /dev/dasdc '/dev/dasdc' is part of Version 1 multivolume dump,which is spread along the following DASD volumes: 0.0.4711 (online, valid) 0.0.4712 (online, valid)[...]
root@larsson:~> zgetdump i /dev/dasdc Dump device: /dev/dasdc>>> Dump header information <<<Dump created on: Fri Aug 7 15:12:41 2009 [...]Multivolume dump: Disk 1 (of 2)Reading dump contents from 0.0.4711.................................Dump ended on: Fri Aug 7 15:12:52 2009Dump End Marker found: this dump is valid.
© 2011 IBM Corporation23
IBM Live Virtual Class – Linux on System z
SCSI dump tool – general usage
1. Create partition with PCBIOS disk-layout (fdisk)
2. Format partition with ext2 or ext3 filesystem
3. Install dump tool:–mount and prepare disk :
–Optional: /etc/zipl.conf:
4. Stop all CPUs
5. Store Status
6. IPL dump device
Dump tool creates dumps directly in filesystem
SCSI dump supported for LPARs and as of z/VM 5.4
root@larsson:~> mount /dev/sda1 /dumpsroot@larsson:~> zipl D /dev/sda1 t dumps
dumptofs=/dev/sda1target=/dumps
© 2011 IBM Corporation24
IBM Live Virtual Class – Linux on System z
SCSI dump under z/VM
SCSI dump from z/VM is supported as of z/VM 5.4 Issue SCSI dump
To access the dump, mount the dump partition
#cp cpu all stop#cp cpu 0 store status#cp set dumpdev portname 47120763 00ce93a7 lun 47120000 00000000 bootprog 0#cp ipl 4b49 dump
© 2011 IBM Corporation25
IBM Live Virtual Class – Linux on System z
SCSI dump on LPAR
Select CPC image for LPAR to dump Goto Load panel Issue SCSI dump
–FCP device–WWPN–LUN
© 2011 IBM Corporation26
IBM Live Virtual Class – Linux on System z
VMDUMP
The only method to dump NSSes or DCSSes under z/VM Works nondisruptive Create dump:
Receive dump:–Store the dump from the reader into CMS dump file:
–Transfer the dump to linux from CMS e.g. FTP–NEW: vmur device driver:
Linux tool to convert vmdump to lkcd format:
Problem: Dump process relatively slow
#cp vmdump to cmsguest
root@larsson:~> vmconvert vmdump linux.dump
#cp dumpload
root@larsson:~> vmur rec <spoolid> vmdump
© 2011 IBM Corporation27
IBM Live Virtual Class – Linux on System z
How to obtain information about a dump
Display information of the involved volume:
Display information about the dump itself:
root@larsson:~> zgetdump d /dev/dasdb '/dev/dasdb' is Version 0 dump device. Dump size limit: none
root@larsson:~> zgetdump i /dev/dasdb1 Dump device: /dev/dasdb1
Dump created on: Thu Oct 8 15:44:49 2009
Magic number: 0xa8190173618f23fdVersion number: 3Header size: 4096Page size: 4096Dumped memory: 1073741824Dumped pages: 262144Real memory: 1073741824cpu id: 0xff00012320978000System Arch: s390x (ESAME)Build Arch: s390x (ESAME)>>> End of Dump header <<<
Dump ended on: Thu Oct 8 15:45:01 2009Dump End Marker found: this dump is valid.
© 2011 IBM Corporation28
IBM Live Virtual Class – Linux on System z
How to obtain information about a dump (cont'd)
Display information about the dump itself (dump header) and check if the dump is valid, use lcrash with options
’-i’ and ’-d’.
root@larsson:~> lcrash i d /dev/dasdb1 Dump Type: s390 standalone dump
Machine: s390x (ESAME) CPU ID: 0xff00012320978000
Memory Start: 0x0 Memory End: 0x40000000 Memory Size: 1073741824
Time of dump: Thu Oct 8 15:44:49 2009 Number of pages: 262144 Kernel page size: 4096 Version number: 3 Magic number: 0xa8190173618f23fd Dump header size: 4096 Dump level: 0x4 Build arch: s390x (ESAME) Time of dump end: Thu Oct 8 15:45:01 2009
End Marker found! Dump is valid!
© 2011 IBM Corporation29
IBM Live Virtual Class – Linux on System z
Automatic dump on panic (SLES 10/11, RHEL 5/6): dumpconf
The dumpconf tool configures a dump device that is used for automatic dump in case of a kernel panic.
–The command can be installed as service script under /etc/init.d/dumpconf or can be called manually.
–Start service: # service dumpconf start
–It reads the configuration file /etc/sysconfig/dumpconf.
–Example configuration for CCW dump device (DASD) and reipl after dump:
ON_PANIC=dump_reipl DUMP_TYPE=ccwDEVICE=0.0.4711
© 2011 IBM Corporation30
IBM Live Virtual Class – Linux on System z
Automatic dump on panic (SLES 10/11, RHEL 5): dumpconf (cont'd)
–Example configuration for FCP dump device (SCSI disk):
–Example configuration for re-IPL without taking a dump, if a kernel panic occurs:
–Example of executing a CP command, and rebooting from device 4711 if a kernel panic occurs:
ON_PANIC=reipl
ON_PANIC=vmcmd VMCMD_1="MSG <vmguest> Starting VMDUMP" VMCMD_2="VMDUMP"VMCMD_3="IPL 4711"
ON_PANIC=dump DUMP_TYPE=fcpDEVICE=0.0.4714WWPN=0x5005076303004712 LUN=0x4047401300000000BOOTPROG=0BR_LBA=0
© 2011 IBM Corporation31
IBM Live Virtual Class – Linux on System z
Get dump and send it to service organization
DASD/Tape:–Store dump to Linux file system from dump device:
–Alternative: lcrash (Compression possible)
SCSI:
–Get dump from filesystem Additional files needed for dump analysis:
– SUSE (lcrash tool): /boot/System.map-xxx and /boot/Kerntypes-xxx
– Redhat & SUSE (crash tool): vmlinux file (kernel with debug info) contained in debug kernel rpms:
• RedHat: kernel-debuginfo-xxx.rpm and kernel-debuginfo-common-xxx.rpm
• SUSE: kernel-default-debuginfo-xxx.rpm
root@larsson:~> zgetdump /dev/<device node> > dump_file
root@larsson:~> lcrash d /dev/dasdxx s <dir>
© 2011 IBM Corporation32
IBM Live Virtual Class – Linux on System z
Handling large dumps
Compress the dump and split it into parts of 1 GB
Several compressed files such as xaa, xab, xac, .... are created Create md5 sums of the compressed files
Upload all parts together with the md5 information
Verification of the parts for a receiver
Merge the parts and uncompress the dump
root@larsson:~> zgetdump /dev/dasdc1 | gzip | split b 1G
root@larsson:~> md5sum xa* > dump.md5
root@larsson:~> md5sum c dump.md5 xaa: OK[....]
root@larsson:~> cat xa* | gunzip c > dump
© 2011 IBM Corporation33
IBM Live Virtual Class – Linux on System z
Transferring dumps
Transferring single volume dumps with ssh
Transferring multi-volume dumps with ssh
Transferring a dump with ftp– Establish an ftp session with the target host, login and set the transfer mode to
binary
– Send the dump to the host
root@larsson:~> zgetdump /dev/dasdc1 | ssh user@host "cat > dump_file_on_target_host"
root@larsson:~> zgetdump /dev/dasdc | ssh user@host "cat > multi_volume_dump_file_on_target_host"
root@larsson:~> ftp> put |"zgetdump /dev/dasdc1" <dump_file_on_target_host>
© 2011 IBM Corporation34
IBM Live Virtual Class – Linux on System z
Dump tool summary
See “Using the dump tools” book on http://www.ibm.com/developerworks/linux/linux390/documentation_dev.html
Tool Stand alone toolsVMDUMP
DASD Tape SCSI
EnvironmentVM&LPAR VM&LPAR VM
Preparation---
Creation
Tape cartridges VM reader
---
Viewing
zipl -d /dev/<dump_dev>mkdir /dumps/mydumpszipl -D /dev/sda1 ...
Stop CPU & Store status ipl <dump_dev_CUU>
vmdump
Dumpmedium
ECKD orFBA
LINUX file systemon a SCSI disk
Copy tofilesystem
zgetdump /dev/<dump_dev>> dump_file
Dumploadftp ...vmconvert ...
lcrash or crash
© 2011 IBM Corporation35
IBM Live Virtual Class – Linux on System z
Customer Cases
© 2011 IBM Corporation36
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots
Configuration:– Oracle RAC server or other HA
solution under z/VM
Problem Description: – Occasionally guests spontaneously
reboot without any notification or console message
Tools used for problem determination:– cp instruction trace of (re)IPL code
– Crash dump taken after trace was hit
Linux 1
Oracle RAC Database
Linux 2
HA Cluster
Oracle RACServer
Oracle RACServer
communication
© 2011 IBM Corporation37
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots - Steps to find root cause
Question: Who rebooted the system? Step 1
–Find out address of (re)ipl code in the system map–Use this address to set instruction trace
cd /bootgrep machine_restart System.map2.6.16.600.54.5default 000000000010c364 T machine_restart00000000001171c8 t do_machine_restart0000000000603200 D _machine_restart
© 2011 IBM Corporation38
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots - Steps to find root cause (cont'd)
Step 2–Set CP instruction trace on the reboot address–System is halted at that address, when a reboot is triggered
CP CPU ALL TR IN R 10C364.4HCPTRI1027I An active trace set has turned RUN off
CP Q TR
NAME INITIAL (ACTIVE) 1 INSTR PSWA 0010C3640010C367 TERM NOPRINT NORUN SIM SKIP 00000 PASS 00000 STOP 00000 STEP 00000 CMD NONE
> 000000000010C364' STMF EBCFF0780024 >> 000000003A557D48 CC 2
© 2011 IBM Corporation39
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots - Steps to find root cause (cont'd)
Step 3–Take a dump, when the (re)ipl code is hit
cp cpu all stopcp store statusStore complete. cp i 4fc6Tracing active at IPLHCPGSP2630I The virtual machine is placed in CP mode due to a SOGP stop and store status from CPU 00.zIPL v1.6.30.24.5 dump tool (64bit)Dumping 64 bit OS00000128 / 00001024 MB......00001024 / 00001024 MBDump successfulHCPIR450W CP entered, disabled wait PSW 00020000 80000000 00000000 00000000
© 2011 IBM Corporation40
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots - Steps to find root cause (cont'd)
Step 4–Save dump in a file
zgetdump /dev/dasdb1 > dump_fileDump device: /dev/dasdb1
>>> Dump header information <<<Dump created on: Wed Oct 27 12:00:40 2010Magic number: 0xa8190173618f23fdVersion number: 4Header size: 4096Page size: 4096Dumped memory: 1073741824Dumped pages: 262144Real memory: 1073741824cpu id: 0xff00012320948000System Arch: s390x (ESAME)Build Arch: s390x (ESAME)>>> End of Dump header <<<
Reading dump content ................................Dump ended on: Wed Oct 27 12:00:52 2010Dump End Marker found: this dump is valid.
© 2011 IBM Corporation41
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots - Steps to find root cause (cont'd)
STACK: 0 start_kernel+950 [0x6a690e] 1 _stext+32 [0x100020]================================================================TASK HAS CPU (1): 0x3f720650 (oprocd.bin): LOWCORE INFO: psw : 0x0704200180000000 0x000000000010c36a function : machine_restart+6 prefix : 0x3f438000 cpu timer: 0x7fffffff 0xff9e6c00 clock cmp: 0x00c6ca69 0x22337760 general registers:
<snip>
STACK: 0 __handle_sysrq+248 [0x361240] 1 write_sysrq_trigger+98 [0x2be796] 2 sys_write+392 [0x225a68] 3 sysc_noemu+16 [0x1179a8]
Step 5–Use (l)crash, to find out, which process has triggered the reboot
/var/opt/oracle/product/crs/bin/oprocd.bin
© 2011 IBM Corporation42
IBM Live Virtual Class – Linux on System z
Availability: Guest spontaneously reboots (cont'd)
Problem Origin:– HA component erroneously detected a system hang
• hangcheck_timer module did not receive timer IRQ
• z/VM 'time bomb' switch
• TSA monitor
z/VM cannot guarantee 'real-time' behavior if overloaded– Longest 'hang' observed: 37 seconds(!)
Solution:– Offload HA workload from overloaded z/VM
• e.g. use separate z/VM
• or: run large Oracle RAC guests in LPAR
© 2011 IBM Corporation43
IBM Live Virtual Class – Linux on System z
Network: network connection is too slow
Configuration:–z/VSE running CICs, connection to DB2 in Linux on System z–Hipersocket connection from Linux to z/VSE –But also applies to hipersocket connections between Linux and z/OS
Problem Description: –When CICS transactions were monitored, some transactions take a
couple of seconds instead of milliseconds Tools used for problem determination:
–dbginfo.sh –s390 debug feature–sadc/sar–CICS transaction monitor
© 2011 IBM Corporation44
IBM Live Virtual Class – Linux on System z
Network: network connection is too slow (cont'd)
s390 debug feature–Check for qeth errors:
dbginfo file
–Check for buffer count:
Problem Origin:
–Too few inbound buffers
cat /sys/kernel/debug/s390dbf/qeth_qerr00 01282632346:099575 2 00 0000000180b20218 71 6f 75 74 65 72 72 00 | qouterr.00 01282632346:099575 2 00 0000000180b20298 20 46 31 35 3d 31 30 00 | F15=10.00 01282632346:099576 2 00 0000000180b20318 20 46 31 34 3d 30 30 00 | F14=00.00 01282632346:099576 2 00 0000000180b20390 20 71 65 72 72 3d 41 46 | qerr=AF00 01282632346:099576 2 00 0000000180b20408 20 73 65 72 72 3d 32 00 | serr=2.
cat /sys/devices/qeth/0.0.1e00/buffer_count16
© 2011 IBM Corporation45
IBM Live Virtual Class – Linux on System z
Network: network connection is too slow (cont'd)
Solution:– Increase inbound buffer count (default: 16, max 128)
– Check actual buffer count with 'lsqeth -p'
– Set the inbound buffer count in the appropriate config file:• SUSE SLES10:
- in /etc/sysconfig/hardware/hwcfg-qeth-bus-ccw-0.0.F200 - add QETH_OPTIONS="buffer_count=128"
• SUSE SLES11: - in /etc/udev/rules.d/51-qeth-0.0.f200.rules add ACTION=="add", SUBSYSTEM=="ccwgroup", KERNEL=="0.0.f200", ATTR{buffer_count}="128"
• Red Hat: - in /etc/sysconfig/network-scripts/ifcfg-eth0 - add OPTIONS="buffer_count=128"
© 2011 IBM Corporation46
IBM Live Virtual Class – Linux on System z
High Disk response times
Configuration:–z10, HDS Storage Server (Hyper PAV enabled)–z/VM, Linux with Oracle Database–VM controlled Minidisks attached to Linux, LVM on top
Problem description:–I/O throughput not matching expectations–Oracle Database shows poor performance because of that–One LVM volume showing significant stress
Tools used for problem determination:–dbginfo.sh–sadc/sar–z/VM Monitor data
© 2011 IBM Corporation47
IBM Live Virtual Class – Linux on System z
High Disk response times (cont'd)Observation in Linux
PAV not being utilized
No Hyper PAV support in SLES10 SP2
Static PAV not possible with current setup (VM controlled minidisks)
Need to look for other ways for more parallel I/O
–Link same minidisk multiple times to a guest
–Use smaller minidisks and increase striping in Linux
Conclusion
dm-9 0.00 0.00 49.75 0.00 19790.50 0.00 795.56 17.89 15.79 2.01 100.00
UtilizationThroughput
Response time
© 2011 IBM Corporation48
IBM Live Virtual Class – Linux on System z
High Disk response times (cont'd)Initial and proposed setup
Physical Disk Logical Disk(s) VM Logical Disk(s) Linux
2.Link Minidisks to guest multiple times
1. Initial Setup
3.Smaller disks, more stripes
multipath
striping
LVM /Device mapper
Link multiple times
© 2011 IBM Corporation49
IBM Live Virtual Class – Linux on System z
High Disk response times (cont'd)New Observation in Linux
Response times stay equal Throughput equal No PAV being used!!
© 2011 IBM Corporation50
IBM Live Virtual Class – Linux on System z
High Disk response times (cont'd)Solution: check PAV setup in VM
© 2011 IBM Corporation51
IBM Live Virtual Class – Linux on System z
High Disk response times (cont'd)Finally:
dasdaen 133.33 0.00 278.11 0.00 12298.51 0.00 88.44 0.79 2.83 1.48 41.29dasdcbt 161.19 0.00 248.26 0.00 12260.70 0.00 98.77 0.91 3.47 1.88 46.77dasdfwc 149.75 0.00 266.17 0.00 12374.13 0.00 92.98 1.88 7.07 2.54 67.66dasdael 162.19 0.00 250.25 0.00 12483.58 0.00 99.77 1.90 7.57 2.86 71.64dasddyz 134.83 0.00 277.61 0.00 12431.84 0.00 89.56 0.75 2.71 1.68 46.77dasdaem 151.24 0.00 266.17 0.00 12595.02 0.00 94.64 2.01 7.61 2.82 75.12dasdcbr 169.65 0.00 242.79 0.00 12386.07 0.00 102.03 1.72 7.05 2.83 68.66dasdfwd 162.69 0.00 249.25 0.00 12348.26 0.00 99.08 1.92 7.70 2.83 70.65dasddyy 157.21 0.00 259.70 0.00 12409.95 0.00 95.57 2.58 9.96 3.05 79.10dasddyx 174.63 0.00 237.81 0.00 12374.13 0.00 104.07 1.76 7.38 2.93 69.65dasdcbs 144.78 0.00 272.14 0.00 12264.68 0.00 90.14 2.53 9.31 2.89 78.61dasda 0.00 0.00 0.00 1.00 0.00 3.98 8.00 0.01 10.00 5.00 0.50dasdq 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dasdss 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dasdadx 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.dasdawh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dasdamk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dasdaek 160.70 0.00 255.22 0.00 12382.09 0.00 97.03 2.27 8.95 2.88 73.63dasdcbq 148.76 0.00 265.67 0.00 12372.14 0.00 93.14 2.14 8.01 2.85 75.62dasddyw 162.19 0.00 254.23 0.00 12384.08 0.00 97.42 2.12 8.40 2.90 73.63dasdfwe 146.27 0.00 271.64 0.00 12419.90 0.00 91.44 2.63 9.71 2.80 76.12dasdfwf 162.19 0.00 249.75 0.00 12455.72 0.00 99.75 0.71 2.83 1.79 44.78dasdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00dm0 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70dm1 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53dm2 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55dm3 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.05 7.32 0.53 87.56dm4 0.00 0.00 1646.77 0.00 49494.53 0.00 60.11 5.08 3.04 0.36 59.70dm5 0.00 0.00 1665.17 0.00 49482.59 0.00 59.43 15.00 9.04 0.56 93.53dm6 0.00 0.00 1660.70 0.00 49432.84 0.00 59.53 13.46 8.11 0.55 90.55dm7 0.00 0.00 1647.26 0.00 49490.55 0.00 60.09 12.06 7.32 0.53 87.56dm8 0.00 0.00 0.00 1.99 0.00 7.96 8.00 0.00 0.00 0.00 0.00dm9 0.00 0.00 497.51 0.00 197900.50 0.00 795.56 7.89 15.79 2.01 100.00
© 2011 IBM Corporation52
IBM Live Virtual Class – Linux on System z
Bonding throughput not matching expectations
Configuration:– SLES10 system, connected via OSA card and using bonding driver
Problem Description:– Bonding only working with 100mbps
– FTP also slow
Tools used for problem determination:– dbginfo.sh, netperf
Problem Origin:– ethtool cannot determine line speed correctly because qeth does not report it
Solution:– Ignore the 100mbps message – upgrade to SLES11
bonding: bond1: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full
© 2011 IBM Corporation53
IBM Live Virtual Class – Linux on System z
Availability: Unable to mount file system after LVM changes
Configuration:– Linux HA cluster with two nodes
– Accessing same dasds which are exported via ocfs2
Problem Description: – Added one node to cluster, brought
Logical Volume online
– Unable to mount the filesystem from any node after that
Tools used for problem determination:– dbginfo.sh
Linux 1
Linux 2
dasda fedcb
Logical Volume
OCFS2
© 2011 IBM Corporation54
IBM Live Virtual Class – Linux on System z
Availability: Unable to mount file system after LVM changes (cont'd)
Problem Origin:– LVM metadata was
overwritten when adding 3rd node
– e.g. superblock not found
Solution:– Extract meta data from
running node (/etc/lvm/backup) and write to disk again
Linux 1
Linux 2
dasdf dbace
Logical Volume
OCFS2
Linux 3
{pv|vg|lv}create
© 2011 IBM Corporation55
IBM Live Virtual Class – Linux on System z
Kernel panic: Low address protection
Configuration:– z10 only
– High work load
– The more likely the more multithreaded applications are running Problem Description:
– Concurrent access to pages to be removed from the page table Tools used for problem determination:
– crash/lcrash
Problem Origin:– Race condition in memory management in firmware
Solution:– Upgrade to latest firmware!
– Upgrade to latest kernels – fix to be integrated in all supported distributions