8/8/2019 System Dump
1/5
How do I analize a system dump ?
faq52-1546
Posted: 7 Feb 02 (Edited 14 Apr 02)
1. System Dump
1.1 Overview
During system crash (flashing 888 on the LCD) dump process is invoked.
Dump process will store the entire kernel segment that resides in real
memory (the kernel segment is segment 0) to a disk for future
debugging. It is actually creates snap shot core image of the machine
in the moment of a crash and save it to a disk.
The core will include also the memory resident user data (such as u-blocks).
If the dump process was successful it will be indicated by 0c0 on the LCD.
1.2 Dump Status Codes
The following dump progress indicators, or dump status codes, are part
of a Type 102
message.
Note: When a lowercase c is listed, it displays in the lower half of
the character
position. Some systems produce 4-digit codes, the two leftmost positions can
have a blanks or zeros. Use the two rightmost digits.
0c0 The dump completed successfully.
0c1 The dump failed due to an I/O error.
0c2 A dump, requested by the user, is started.
0c3 The dump is inhibited.
0c4 The dump device is not large enough.
0c5 The dump did not start, or the dump crashed.
0c6 Dumping to a secondary dump device.
0c7 Reserved.
0c8 The dump function is disabled.
0c9 A dump is in progress.
0cc Unknown dump failure
1.3 Crash Codes
Note: Some systems may produce 4-digit codes. If the leftmost digit of
a 4-digit code is
0, use the three rightmost digits.
The following crash codes are part of a Type 102 message.
000 Unexpected system interrupt.
8/8/2019 System Dump
2/5
200 Machine check because of a memory bus error.
201 Machine check because of a memory timeout.
202 Machine check because of a memory card failure.
203 Machine check because of a out of range address.
204 Machine check because of an attempt to write to ROS.
205 Machine check because of an uncorrectable address parity.
206 Machine check because of an uncorrectable ECC error.
207 Machine check because of an unidentified error.
208 Machine check due to an L2 uncorrectable ECC.
300 Data storage interrupt from the processor.
32x Data storage interrupt because of an I/O exception from IOCC.
38x Data storage interrupt because of an I/O exception from SLA.
400 Instruction storage interrupt.
500 External interrupt because of a scrub memory bus error.
501 External interrupt because of an unidentified error.
51x External interrupt because of a DMA memory bus error.
52x External interrupt because of an IOCC channel check.
53x External interrupt from an IOCC bus timeout; x represents the IOCC number.
54x External interrupt because of an IOCC keyboard check.
558 There is not enough memory to continue the IPL.
700 Program interrupt.
800 Floating point is not available.
1.4 Enabling system dump
We can check the current dump settings via "smitty dump" or as following :
# sysdumpdev -l
primary /dev/hd6
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag TRUE
always allow dump TRUE
dump compression OFF
This means that the system dump is enabled (always allow dump= TRUE)
and it will be copied to /var/adm/ras.
If always allow dump=FALSE, then the core dump will not be generated.
To change this, change the settings in smit:
Always ALLOW System Dump = true
/varfilesystem must have enough space to accommodate a couple of system dumps.
Its size can be increased by, say, 100MB as following:
8/8/2019 System Dump
3/5
# chfs -a size=+200000 /var
1.5 Analyzing system dump
If the customer complains that his system had frozen with 888 on the
display, check errpt for the entry like this:
C0AA5338 0614145601 U S SYSDUMP SYSTEM DUMP
This means that the system dump have occurred on 14 of June at 14:56.
Run the following command to verify the status of the last system dump:
# sysdumpdev -L
0453-039
Device name: /dev/hd6
Major device number: 10
Minor device number: 2
Size: 63952384 bytes
Date/Time: Thu Jun 14 14:43:11 CST 2001
Dump status: 0
dump completed successfully
Dump copy filename: /var/adm/ras/vmcore.0
Run the crash command in order to get a basic idea on the possible
reasons of the system dump.
The crash subcommands (trace -k, thread -r, status 0) are used to
provide a hint on the problem origin:
#cd /var/adm/ras
#crash vmcore.0
Using /unix as the default namelist file.
> trace -k
STACK TRACE:
0x2ff3b400 (excpt=edffff54:40000000:00001004:edffff54:00000106) (intpri=0)
IAR: .remove_e_list+38 (00032888): tweqi r7,0x0
LR: .e_block_thread+40c (00034424)
2ff3b010: .e_sleep_thread+4c (0003497c)
2ff3b060: .[nspdd]+4144 (016ba4e4)
2ff3b100: .[nspdd]+2de4 (016b9184)
8/8/2019 System Dump
4/5
8/8/2019 System Dump
5/5
>q ;quits the crash command
=================================================================
In this case trace -k shows a problem with nspdd process, which is
part of the TSP driver.
thread -r and status 0 both hint on the application process pltDc as
responsible for the core dump (it's the last process that run).
The core file can be copied on a CD and sent to IBM for further analyzing.