Download pdf - System Dump

8/8/2019 System Dump

1/5

How do I analize a system dump ?

faq52-1546

Posted: 7 Feb 02 (Edited 14 Apr 02)

1. System Dump

1.1 Overview

During system crash (flashing 888 on the LCD) dump process is invoked.

Dump process will store the entire kernel segment that resides in real

memory (the kernel segment is segment 0) to a disk for future

debugging. It is actually creates snap shot core image of the machine

in the moment of a crash and save it to a disk.

The core will include also the memory resident user data (such as u-blocks).

If the dump process was successful it will be indicated by 0c0 on the LCD.

1.2 Dump Status Codes

The following dump progress indicators, or dump status codes, are part

of a Type 102

message.

Note: When a lowercase c is listed, it displays in the lower half of

the character

position. Some systems produce 4-digit codes, the two leftmost positions can

have a blanks or zeros. Use the two rightmost digits.

0c0 The dump completed successfully.

0c1 The dump failed due to an I/O error.

0c2 A dump, requested by the user, is started.

0c3 The dump is inhibited.

0c4 The dump device is not large enough.

0c5 The dump did not start, or the dump crashed.

0c6 Dumping to a secondary dump device.

0c7 Reserved.

0c8 The dump function is disabled.

0c9 A dump is in progress.

0cc Unknown dump failure

1.3 Crash Codes

Note: Some systems may produce 4-digit codes. If the leftmost digit of

a 4-digit code is

0, use the three rightmost digits.

The following crash codes are part of a Type 102 message.

000 Unexpected system interrupt.


2/5

200 Machine check because of a memory bus error.

201 Machine check because of a memory timeout.

202 Machine check because of a memory card failure.

203 Machine check because of a out of range address.

204 Machine check because of an attempt to write to ROS.

205 Machine check because of an uncorrectable address parity.

206 Machine check because of an uncorrectable ECC error.

207 Machine check because of an unidentified error.

208 Machine check due to an L2 uncorrectable ECC.

300 Data storage interrupt from the processor.

32x Data storage interrupt because of an I/O exception from IOCC.

38x Data storage interrupt because of an I/O exception from SLA.

400 Instruction storage interrupt.

500 External interrupt because of a scrub memory bus error.

501 External interrupt because of an unidentified error.

51x External interrupt because of a DMA memory bus error.

52x External interrupt because of an IOCC channel check.

53x External interrupt from an IOCC bus timeout; x represents the IOCC number.

54x External interrupt because of an IOCC keyboard check.

558 There is not enough memory to continue the IPL.

700 Program interrupt.

800 Floating point is not available.

1.4 Enabling system dump

We can check the current dump settings via "smitty dump" or as following :

# sysdumpdev -l

primary /dev/hd6

secondary /dev/sysdumpnull

copy directory /var/adm/ras

forced copy flag TRUE

always allow dump TRUE

dump compression OFF

This means that the system dump is enabled (always allow dump= TRUE)

and it will be copied to /var/adm/ras.

If always allow dump=FALSE, then the core dump will not be generated.

To change this, change the settings in smit:

Always ALLOW System Dump = true

/varfilesystem must have enough space to accommodate a couple of system dumps.

Its size can be increased by, say, 100MB as following:


3/5

# chfs -a size=+200000 /var

1.5 Analyzing system dump

If the customer complains that his system had frozen with 888 on the

display, check errpt for the entry like this:

C0AA5338 0614145601 U S SYSDUMP SYSTEM DUMP

This means that the system dump have occurred on 14 of June at 14:56.

Run the following command to verify the status of the last system dump:

# sysdumpdev -L

0453-039

Device name: /dev/hd6

Major device number: 10

Minor device number: 2

Size: 63952384 bytes

Date/Time: Thu Jun 14 14:43:11 CST 2001

Dump status: 0

dump completed successfully

Dump copy filename: /var/adm/ras/vmcore.0

Run the crash command in order to get a basic idea on the possible

reasons of the system dump.

The crash subcommands (trace -k, thread -r, status 0) are used to

provide a hint on the problem origin:

#cd /var/adm/ras

#crash vmcore.0

Using /unix as the default namelist file.

> trace -k

STACK TRACE:

0x2ff3b400 (excpt=edffff54:40000000:00001004:edffff54:00000106) (intpri=0)

IAR: .remove_e_list+38 (00032888): tweqi r7,0x0

LR: .e_block_thread+40c (00034424)

2ff3b010: .e_sleep_thread+4c (0003497c)

2ff3b060: .[nspdd]+4144 (016ba4e4)

2ff3b100: .[nspdd]+2de4 (016b9184)


4/5


5/5

>q ;quits the crash command

=================================================================

In this case trace -k shows a problem with nspdd process, which is

part of the TSP driver.

thread -r and status 0 both hint on the application process pltDc as

responsible for the core dump (it's the last process that run).

The core file can be copied on a CD and sent to IBM for further analyzing.