X64 Work ShopLinux Information Gathering 2. Agenda
-
- Support model and structure
3. Agenda (cont)
-
- System Core Dump Capturing
4. Linux Support Overview
- Linux Support from System TSC organization:
5. Linux Support Overview (cont)
- Supported Linux Versions:
-
- Red Hat Enterprise Linux (RHEL)
-
-
- Only existing contracts, no new contracts after the
30.09.2006.
-
- Novell/SuSE Linux Enterprise (SLES)
- Back line support from Vendor available
-
- We have a path to escalate issue to Red Hat or
Novell/SuSE.
6. Linux Support Overview (cont)
- What is covered by support?
-
- Bugs within the OS or with Core applications
-
- Configuration of the system
-
- Own compiled Kernels, tainted Modules
-
- Sun do not fix bugs within any distribution, its up to the
Vendor.
7. Data Collection
-
- We need the entitlement for the Linux the customer
installed
- General thoughts about data collection
-
- The issue must be visible within the data.
-
-
- Anything changed to the system? New data!
-
- And it must be understandable.
8. Data Collection (cont)
-
- Customer has a working and a non working system
-
-
- Collect data from both systems
-
- Customer has changed the configuration by the advice of Sun
Support, but this doesn't work.
-
-
- Collect again all relevant data from the system to see what was
changed.
-
- Customer applies online updates to the system, but the issue
isn't fixed.
-
-
- We need again the data from the system to see what updates are
applied.
9. Data Collection (cont)
-
-
- Mandatory for escalating to Red Hat
-
-
- Lack of some interesting information.
-
-
- Insufficient messages etc.
-
-
- Collect much more infos than siga.
10. Data Collection (cont)
-
-
- Most complete data collection
-
-
- We are in discussion with SuSE to also accept this data set
instead of siga/config.sh
-
-
- Not accepted by Red Hat for escalation
11. Data Analyzing
- There is no automatic tool!
- This presentation isn't complete at all.
- Determinate the Linux Version:
12. Data Analyzing (cont)
- What packages are installed? Which version?
-
- RPM is the packages manager of RHEL and SLES.
-
-
- rpm -qaV (takes some time)
-
- looking in the sar data (package sysstat) shows the load of the
system at the time when an issue occurs
13. Data Analyzing (cont)
- Hardware/Firmware information
-
-
-
- Python script wrapping dmidecode (RH only, may included in
sysreport)
-
-
- e.g. firmware of SCSI disk in /proc/scsi/scsi
14. Data Analyzing (cont)
15. System Core Dump Capturing
- No standard at the moment
-
- Kdump has find it's way into the mainstream kernel.
- RHEL 3 / 4 uses it's own stuff
-
- An resident own kernel with small footprint
-
- highly flexible and reliable
16. System Core Dump Capturing (cont)
-
- Based on an IBM/SGI implementation.
-
- An resident own kernel with small footprint
-
- highly flexible and reliable
17. Setting up RHEL 3 & 4 Netdump
-
- install package netdump-server
-
- normally no configuration needed.
-
- install package netdump-client
-
- configure /etc/sysconfig/netdump
-
- "service netdump propagate"
18. Setting up RHEL 5 Kdump
-
-
- partitions: ext2 / ext3 / raw
- Quite easy to setup with the GUI dialog
19. Setting up SLES 8/9 LKCD
- Install required package:
-
- # insserv /etc/init.d/boot.lkcd
20. Seting up SLES 10 Kdump
- Edit /etc/sysconfig/kdump
- Enable kdump init service
- Add boot option "crashkernel=64M@16M"
21. Checking dump setup
- Check if everything fit together:
-
- Enable Magic SysRq feature temporarily
-
-
- echo "1" > /proc/sys/kernel/sysrq
-
-
- echo "c" > /proc/sysrq-trigger
22. Linux SysRq Feature
- The Magic SysRq Feature is somewhat similar to Stop-A on
Solaris
- It can force the kernel to printout or dump information about
the system
- Sometimes really helpful for trouble shouting
- May even work if the system seems to hang
23. Linux SysRq Feature (cont)
- Disabled by default, need to be enabled
-
- temporarily until next reboot
-
-
- echo "1" > /proc/sys/kernel/sysrq
-
-
- edit /etc/sysctl.conf to add the line: kernel.sysrq = 1
- Issue locally on keyboard by Alt+SysRQ+
- Issue remote by "echo > /proc/sysrq-trigger"
24. Linux SysRq Feature (cont)
-
-
- call the Secure Attention function (SAK). SAK terminate every
process running on the actual console, to cleanup the
terminal.
-
-
- Synchronized all hard disks.
-
-
- Remounts all hard disks in read only mode. This will prevent
dataloss, when the system is in an unstable situation.
-
-
- Shows the actual task list.
25. Linux SysRq Feature (cont)
-
-
- boots the system immediately. You should synchronize and
remount the hard disks read only before restarting the system.
-
-
- Prints out the actual register content.
-
-
- Prints out the memory information.
- For a complete list lookup sysrq.txt in the Kernel
documentation
26. Crash Dump Analyzing
-
- Support varios dump fomats
-
-
- Kdump, LKCD, Net/Disk dump
-
- Can examinate live system Kernel
- http://people.redhat.com/~anderson/
27. Crash Dump Analyzing (cont)
- You need to have the debug information of the kernel
- Crash package need to be installed
- Load vmcore for analyzing
-
- crash System.map vmlinux vmcore
28. Troubleshoot a Hanging System
- Hard to troubleshoot due to lack of information
- If a deadlocked kernel, NMI watchdog may help
-
- add Kernel boot cmd nmi_watchdog=1 to grub configuration.
-
- When system look is detected, a kernel panic will be
initiated.
- There might be a chance to force a dump (SysRq) when system
hanging
29. Links
-
- Linux Explorer
http://www.unix-consultants.co.uk/examples/scripts/linux/linux-explorer/
-
- LKCD Setup on SLES
http://www.novell.com/coolsolutions/feature/15284.html
-
- Crash Utility http://people.redhat.com/~anderson/
-
- System TSC Linux pages
http://systems-tsc/twiki/bin/view/Teams/LinuxDataGathering
-
- PTS Linux pages (outdated)
http://barentz.germany.sun.com/ptsvs/Wiki.jsp?page=LinuxHowTos
30. Links
- Did you know http://www.google.com/linux?
31. X64 Work ShopLinux Information Gathering