Analyzing Carsh Dumps in Solaris Using Mdb and SCAT

Embed Size (px)

DESCRIPTION

Analyzing Carsh Dumps in Solaris Using Mdb and SCAT

Citation preview

  • Prepared By

    MOHAIDEEN ABDUL KADER. F

    Enterprise Services

    ANALYSING THE CRASH DUMP IN SOLARIS

    USING MDB & SCAT (SOLARIS CAT)

  • ANALYZING THE CRASH DUMP IN SOLARIS USING MDB & SOLARIS CRASH ANALYSIS TOOL

    WHAT IS A CRASH DUMP??

    A crash dump is a disk copy of the physical memory of the computer at the time of a fatal system error.

    When a fatal operating system error occurs, a message describing the error is printed to the console.

    The operating system then generates a crash dump by writing the contents of physical memory to a

    predetermined dump device, which is typically a local disk partition. The dump device can be configured

    by way of dumpadm. Once the crash dump has been written to the dump device, the system will

    reboot.

    Fatal operating system errors can be caused by bugs in the operating system, its associated device

    drivers and loadable modules, or by faulty hardware. Whatever the cause, the crash dump itself

    provides invaluable information to aid in diagnosing the problem. Following an operating system crash,

    the savecore utility is executed automatically during boot to retrieve the crash dump from the dump

    device, and write it to a pair of files in your file system named unix.X and vmcore.X, where X is an

    integer identifying the dump. Together, these data files form the saved crash dump. The directory in

    which the crash dump is saved on reboot can also be configured using dumpadm.

    HOW TO ANALYZE THE DUMP FILES USING MDB

    MDB

    The mdb utility is an extensible utility for low-level debugging and editing of the live operating

    system, operating system crash dumps, user processes, user process core dumps, and object files.

    mdb provides a completely customizable environment for debugging these programs and scenarios,

    including a dynamic module facility that programmers can use to implement their own debugging

    commands to perform program-specific analysis. Each mdb module can be used to examine the

    program in several different contexts, including live and post-mortem.

    By default the dump file will be find inside /var/crash/hostname/. The dumps will be in pairs: vmcore.0

    and unix.0.

    Feed these two files to mdb, the (-k, kernel) Modular DeBugger, to preform the analysis :

    # mdb -k unix.0 vmcore.0

    Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp

    usba lofs zfs random ipc md fcip fctl fcp crypto logindmux ptm nfs ]

    >

  • There are a couple of debugger commands that can give us the essence of what we need.

    The ::status command will display high level information regarding this debugging session. Of usefulness here is the dumps "panic message" and OS release.

    > ::status

    debugging crash dump vmcore.0 (64-bit) from hostname

    operating system: 5.11 snv_43 (i86pc)

    panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module

    "unix" due to a NULL pointer dereference

    dump content: kernel pages only

    The ::stack command will prove you with a stack trace, this is the same thing trace you would have seen in syslog or the console.

    > ::stack

    atomic_add_32()

    nfs_async_inactive+0x55(fffffe820d128b80, 0, ffffffffeff0ebcb)

    nfs3_inactive+0x38b(fffffe820d128b80, 0)

    fop_inactive+0x93(fffffe820d128b80, 0)

    vn_rele+0x66(fffffe820d128b80)

    snf_smap_desbfree+0x78(fffffe8185e2ff60)

    dblk_lastfree_desb+0x25(fffffe817a30f8c0, ffffffffac1d7cc0)

    dblk_decref+0x6b(fffffe817a30f8c0, ffffffffac1d7cc0)

    freeb+0x89(fffffe817a30f8c0)

    tcp_rput_data+0x215f(ffffffffb4af7140, fffffe812085d780, ffffffff993c3c00)

    squeue_enter_chain+0x129(ffffffff993c3c00, fffffe812085d780, fffffe812085d780, 1, 1)

    ip_input+0x810(ffffffffa23eec68, ffffffffaeab8040, fffffe812085d780, e)

    i_dls_link_ether_rx_promisc+0x266(ffffffff9a4c35f8, ffffffffaeab8040, fffffe812085d780)

    mac_rx+0x7a(ffffffffa2345c40, ffffffffaeab8040, fffffe812085d780)

    e1000g_intr+0xf6(ffffffff9a4b2000)

    av_dispatch_autovect+0x83(1a)

    intr_thread+0x50()

    The ::msgbuf command will output the message buffer at the time of crash;

    > ::msgbuf

    MESSAGE

    ....

    WARNING: IP: Hardware address '00:14:4f:xxxxxxx' trying to be our address xxxx

    WARNING: IP: Hardware address '00:14:4f:xxxx' trying to be our address xxxx

    panic[cpu0]/thread=fffffe80000adc80:

  • BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a

    NULL pointer dereference

    sched:

    #pf Page fault

    Bad kernel fault at addr=0x0

    .....

    The ::panicinfo command will give you lots of fun cryptic counter information, of most interest is the first 3 lines, which contain the CPU on which the panic occurred, the running thread, and the panic

    message. You'll notice these are commonly repeated and the most useful pieces of information.

    > ::panicinfo

    cpu 0

    thread fffffe80000adc80

    message BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module

    "unix" due to a NULL pointer dereference

    rdi 0

    rsi 1

    rdx fffffe80000adc80

    rcx 0

    r8 0

    r9 fffffe80dba125c0

    rax 0

    rbx fffffe8153a36040

    rbp fffffe80000ad4e0

    r10 3e0

    r10 3e0

    r11 ffffffffaeab8040

    r12 ffffffffb7b4cac0

    r13 0

    r14 fffffe820d128b80

    r15 ffffffffeff0ebcb

    fsbase ffffffff80000000

    gsbase fffffffffbc27850

    ds 43

    es 43

    fs 0

    gs 1c3

    trapno e

    err 2

    rip fffffffffb838680

    cs 28

    rflags 10246

    rsp fffffe80000ad4c8

  • ss 0

    gdt_hi 0

    gdt_lo defacedd

    idt_hi 0

    idt_lo 80300fff

    ldt 0

    task 60

    cr0 80050033

    cr2 0

    cr3 10821b000

    The ::cpuinfo v is one of the most useful commands , if multiple applications are running on a server the most common question people want answered is "which application did it?", This command

    will help you determine that by displaying, complete with beautiful ASCII art, the threads and process

    names running on each CPU (NRUN).

    In the following example, we know the event occured on CPU 0, thus thats the one we want to look at.

    Note that the "sched" process should be interpreted as "kernel".

    > ::cpuinfo -v

    ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC

    0 fffffffffbc2f370 1b 1 0 165 no no t-1 fffffe80000adc80 sched

    | | |

    RUNNING PIL THREAD

    READY | 6 fffffe80000adc80

    EXISTS | - fffffe80daab6a20 ruby

    ENABLE |

    +--> PRI THREAD PROC

    99 fffffe8000b88c80 sched

    ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC

    1 ffffffff983b3800 1f 1 0 59 yes no t-0 fffffe80daac2f20 smtpd

    | |

    RUNNING PRI THREAD PROC

    READY 99 fffffe8000bacc80 sched

    QUIESCED

    EXISTS

    ENABLE

    ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC

    2 ffffffff9967a800 1f 2 0 -1 no no t-0 fffffe8000443c80

    (idle)

    | |

    RUNNING PRI THREAD PROC

    READY 99 fffffe8000b82c80 sched

  • QUIESCED 60 fffffe80018f8c80 sched

    EXISTS

    ENABLE

    ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC

    3 ffffffff9967a000 1f 1 0 -1 no no t-0 fffffe8000535c80

    (idle)

    | |

    RUNNING PRI THREAD PROC

    READY 60 fffffe8000335c80 zsched

    QUIESCED

    EXISTS

    ENABLE

    The ::ps command allows us to see all running processes. Several flags are supported, including -z to display Zone ID's.

    > ::ps -z

    S PID PPID PGID SID ZONE UID FLAGS ADDR NAME

    R 0 0 0 0 0 0 0x00000001 fffffffffbc25900 sched

    R 3 0 0 0 0 0 0x00020001 ffffffff9970d928 fsflush

    R 2 0 0 0 0 0 0x00020001 ffffffff9970e558 pageout

    R 1 0 0 0 0 0 0x42004000 ffffffff9970f188 init

    R 20534 1 20533 20533 24 1006 0x42010400 ffffffffb246f9b8 ruby

    R 20532 1 20531 20531 24 1006 0x42010400 fffffe8109674308 ruby

    R 20529 1 20528 20528 24 1006 0x42010400 fffffe80dc5602f0 ruby

    ...

    We can use ::pgrep to search for processes and use the appropriate address for further digging.

    In the following example a Java process is found and then determine which zone that process was

    running in:

    > ::pgrep java

    S PID PPID PGID SID UID FLAGS ADDR NAME

    R 3628 1 3620 3574 0 0x42004400 fffffe80deeb3240 java

    > fffffe80deeb3240::print proc_t p_zone->zone_name

    p_zone->zone_name = 0xffffffffae0cef00 "testzone03"

  • ANALYSIS OF THE DUMP FILES USING SOLARIS CRASH ANALYSIS TOOL

    Solaris CAT has been around for a long time, but only as of version 5.0 its been available for Solaris

    X86/X64.

    Installing Solaris CAT

    Download the SCCAT 5.0 , uncompress and install the package:

    # bunzip2 SUNWscat5.0-GA-i386.pkg.bz2

    # pkgadd -G -d ./SUNWscat5.0-GA-i386.pkg

    The following packages are available:

    1 SUNWscat Solaris Crash Analysis Tool (5.0 GA SV4622M)

    (i386) 5.0

    Select package(s) you wish to process (or 'all' to process

    all packages). (default: all) [?,??,q]: 1

    Processing package instance from

    Solaris Crash Analysis Tool (5.0 GA SV4622M)(i386) 5.0

    ...

    The package will, by default, install into /opt/SUNWscat. There are two binaries we're really interested

    in, found in the bin/ directory: scat and blast.

    The scat tool is the CLI interface to Solaris CAT and provides a shell which is a human friendly re-

    implementation of mdb (no "::" prefixing commands, etc.)

    The blast tool is a really nice Java GUI interface to the CLI which adds a lot of "just click here"

    functionality and is excellent for testing and playing around.

    Note : At /opt/SUNWscat/docs/index.html some minimal but extremely useful HTML documentation

    is available.

    Add /opt/SUNWscat/bin to your path and then change to the directory containing your dumps, check

    the path of the dump device using the command dumpadm usual /var/crash/hostname/ for the .0

    dumps use "scat 0", for the .1 dumps use "scat 1", and so on.

    # export PATH=$PATH:/opt/SUNWscat/bin

    # cd /var/crash/ev2-r01-s10/

    # ls -l

  • total 14205330

    -rw-r--r-- 1 root root 2 Aug 25 07:49 bounds

    -rw-r--r-- 1 root root 1444762 Aug 25 07:43 unix.0

    -rw-r--r-- 1 root root 7268106240 Aug 25 07:49 vmcore.0

    # scat 0

    Solaris[TM] CAT 5.0 for Solaris 11 64-bit x86

    SV4622M, Jul 3 2008

    Copyright 2008 Sun Microsystems, Inc. All rights reserved.

    Use is subject to license terms.

    Feedback regarding the tool should be sent to [email protected]

    Visit the Solaris CAT blog at http://blogs.sun.com/SolarisCAT

    opening unix.0 vmcore.0 ...dumphdr...symtab...core...done

    loading core data: modules...symbols...ctftype: unknown type struct panic_trap_info

    CTF...done

    core file: /var/crash/xxxxxxxx/vmcore.0

    user: Super-User (root:0)

    release: 5.11 (64-bit)

    version: snv_67

    machine: i86pc

    node name: xxxxxxxxxxxxxxxxxx

    system type: i86pc

    hostid: xxxxxxxx

    dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/dsk/c0t0d0s1(24.0G)

    time of crash: Mon Aug 25 07:41:00 GMT 2008 (core is 13 days old)

    age of system: 91 days 22 hours 49 minutes 50.97 seconds

    panic CPU: 1 (8 CPUs, 31.9G memory)

    panic string: page_free pp=ffffff0007243bd8, pfn=11228e, lckcnt=0, cowcnt=0 slckcnt = 0

    sanity checks: settings...vmem...

    WARNING: FSS thread 0xffffff097d1e3400 on CPU2 using 99%CPU

    WARNING: FSS thread 0xffffff09fddbab40 on CPU3 using 99%CPU

    sysent...clock...misc...

    NOTE: system has 54 non-global zones

    done

    SolarisCAT(vmcore.0/11X)>

    When CAT is unleashed on a dump several "sanity checks" are run which can point out glaring known

    issues. There is an HTML document in the docs/ directory which outlines all the various sanity checks.

    These checks alone make CAT a must-have tool. Sanity check output will come in two varieties,

  • "WARNING" which indicates something out of whack that may have been the cause or contributor to

    the crash, and "NOTE" which is unlikely the cause but of interest.

    We can see in the above example two warnings telling me that 2 threads were consuming 99% of a

    CPU.. It also notes that I'm running 54 zones.

    The available commands are broken down into categories which can be seen using the "help" command.

    The first group are for "Initial Investigation:" and include: analyze, coreinfo, msgbuf, panic, stack, stat,

    and toolinfo.

    The "analyze" command output:

    SolarisCAT(vmcore.0/11X)> analyze

    core file: /var/crash/xxxxxx/vmcore.0

    user: Super-User (root:0)

    release: 5.11 (64-bit)

    version: snv_67

    machine: i86pc

    node name: xxxxxx

    system type: i86pc

    hostid: xxxxx

    dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/dsk/c0t0d0s1(24.0G)

    time of crash: Mon Aug 25 07:41:00 GMT 2008 (core is 13 days old)

    age of system: 91 days 22 hours 49 minutes 50.97 seconds

    panic CPU: 1 (8 CPUs, 31.9G memory)

    panic string: page_free pp=ffffff0007243bd8, pfn=11228e, lckcnt=0, cowcnt=0 slckcnt = 0

    ==== panic thread: 0xfffffffef4ce5dc0 ==== CPU: 1 ====

    ==== panic user (LWP_SYS) thread: 0xfffffffef4ce5dc0 PID: 10156 on CPU: 1 ====

    cmd: /opt/local/sbin/httpd -k start

    t_procp: 0xffffffff06595e50

    p_as: 0xffffffff093490e0 size: 47374336 RSS: 3125248

    hat: 0xffffffff092a9480 cpuset: 1

    zone: address translation failed for zone_name addr: 8 bytes @ 0x3

    t_stk: 0xffffff00486bcf10 sp: 0xffffff00486bc880 t_stkbase: 0xffffff00486b8000

    t_pri: 3(FSS) pctcpu: 0.380035

    t_lwp: 0xfffffffefe61ab60 lwp_regs: 0xffffff00486bcf10

    mstate: LMS_SYSTEM ms_prev: LMS_SYSTEM

    ms_state_start: 2 minutes 31.229022230 seconds earlier

    ms_start: 2 minutes 31.343582414 seconds earlier

    psrset: 0 last CPU: 1

  • idle: 0 ticks (0 seconds)

    start: Mon Aug 25 07:41:00 2008

    age: 0 seconds (0 seconds)

    syscall: #131 memcntl(, 0x0) ()

    tstate: TS_ONPROC - thread is being run on a processor

    tflg: T_PANIC - thread initiated a system panic

    T_DFLTSTK - stack is default size

    tpflg: TP_MSACCT - collect micro-state accounting information

    tsched: TS_LOAD - thread is in memory

    TS_DONT_SWAP - thread/LWP should not be swapped

    TS_RUNQMATCH

    pflag: SMSACCT - process is keeping micro-state accounting

    SMSFORK - child inherits micro-state accounting

    pc: unix:vpanic_common+0x13b: addq $0xf0,%rsp

    unix:vpanic_common+0x13b()

    unix:panic+0x9c()

    unix:page_free+0x22e()

    unix:page_destroy+0x100()

    genunix:fs_dispose+0x2e()

    genunix:fop_dispose+0xdc()

    genunix:pvn_getdirty+0x1f0()

    zfs:zfs_putpage+0x129()

    genunix:fop_putpage+0x65()

    genunix:segvn_sync+0x39f()

    genunix:as_ctl+0x1f2()

    genunix:memcntl+0x709()

    unix:_syscall32_save+0xbf()

    -- switch to user thread's user stack --

    This output provides a vast array of useful details, including:

    System summary, including OS release and version, architecture, hostname, and hostid; as well

    as number of CPU's and memory

    Time of crash and previous uptime ("age of system")

    The panic string and CPU that it occurred on

    The thread that caused the panic and its details, including the command (argc &argv), its

    memory footprint (size & rss), and zone

    The threads state information, run time, start time, current syscall

    The call stack

    What most people are really looking for when doing core analysis is to determine which application was

    responsable, and this output provides that data in great clarity. Lets dig into it a bit more explicitly.

  • Based on the above "analyze" output we can see that

    The system is an 8CPU X86 box running snv_67 (Solaris Nevada Build 67) in 64bit mode with

    32GB of RAM.

    System crashed on Aug 25th at 7:41AM GMT, it was previously up for 91 days

    System paniced on "page_free" call, on CPU 1

    The running thread was "httpd -k start"... an Apache worker process.

    The process had the PID 10156, consumed 3.1MB of Physical Memory (RSS) and had a virtual

    size of 47MB

    The process was using less than 1% (pctcpu) of CPU 1, was using the Fair Share Scheduler (FSS),

    on Processor Set (psrset) 0.

    The process started on Aug 25th at 7:41AM GMT, it was 0 seconds old when it crashed, possibly

    a forked worker gone bad.

    IN DEPTH ANALYSIS OF THE DUMP FILE USING SCAT TOOL

    You'll recall that during the sanity checks at startup it noted 2 threads consuming full CPU's. We can feed

    the thread address to the "thread" command to get details on them:

    SolarisCAT(vmcore.0/11X)> thread 0xffffff097d1e3400

    ==== user (LWP_SYS) thread: 0xffffff097d1e3400 PID: 27446 on CPU: 2 ====

    cmd: nano svn-commit.tmp

    t_procp: 0xffffffff2e908ab0

    p_as: 0xffffffff10402ee0 size: 2772992 RSS: 1642496

    hat: 0xffffffff102f6b48 cpuset: 2

    zone: address translation failed for zone_name addr: 8 bytes @ 0x2

    t_stk: 0xffffff004e47ef10 sp: 0xffffff003d3fcf08 t_stkbase: 0xffffff004e47a000

    t_pri: 26(FSS) pctcpu: 99.306175

    t_lwp: 0xffffffff202a78b0 lwp_regs: 0xffffff004e47ef10

    mstate: LMS_SYSTEM ms_prev: LMS_USER

    ms_state_start: 2 minutes 31.228983791 seconds earlier

    ms_start: 39 days 19 hours 11 minutes 8.989252296 seconds earlier

    psrset: 0 last CPU: 2

    idle: 9 ticks (0.09 seconds)

    start: Wed Jul 16 12:30:07 2008

    age: 3438653 seconds (39 days 19 hours 10 minutes 53 seconds)

    syscall: #98 sigaction(, 0x0) ()

    tstate: TS_ONPROC - thread is being run on a processor

    tflg: T_DFLTSTK - stack is default size

    tpflg: TP_TWAIT - wait to be freed by lwp_wait

    TP_MSACCT - collect micro-state accounting information

    tsched: TS_LOAD - thread is in memory

    TS_DONT_SWAP - thread/LWP should not be swapped

    TS_RUNQMATCH

  • pflag: SMSACCT - process is keeping micro-state accounting

    SMSFORK - child inherits micro-state accounting

    pc: unix:panic_idle+0x23: jmp -0x2 (unix:panic_idle+0x23)

    unix:panic_idle+0x23()

    0xffffff003d3fcf60()

    -- error reading next frame @ 0x0 --

    So using the "thread" command we can get full granularity on a given thread. In fact, using the "tlist"

    command you can dump this information for every thread on the system at the time of crash.

    Another nifty command is "tunables". This will display the "current value" (at time of the dump) and the default value. If someone's been experimenting on the production systems this will clue you in.

    SolarisCAT(vmcore.0/11X)> tunables

    Tunable Name Current Default Value Units Description

    Value

    physmem 8386375 * pages Physical memory installed in system.

    freemem 376628 * pages Available memory.

    avefree 338943 * pages Average free memory in the last 30

    seconds

    .........

    Using the "dispq" command we can look at the dispatch queues (run queue). This answers "what other processes were running on CPU at the time of the crash", again, using the thread address we can dig

    into them with "thread":

    SolarisCAT(vmcore.0/11X)> dispq

    CPU thread pri PID cmd

    0 @ 0xfffffffffbc26bb0 0xffffff003d005c80 -1 (idle)

    pri 60 -=> 0xffffff004337dc80 60 0 sched

    1 @ 0xfffffffec6634000 P 0xfffffffef4ce5dc0 P 3 10156 /opt/local/sbin/httpd -k start

    2 @ 0xfffffffec662f000 0xffffff097d1e3400 26 27446 nano svn-commit.tmp

    3 @ 0xfffffffec66f4800 0xffffff09fddbab40 25 21329 java -jar xxxxx.jar --ui=console

    4 @ 0xfffffffec66ea800 0xffffff003d414c80 -1 (idle)

    pri 60 -=> 0xffffff0048b12c80 60 0 sched

    5 @ 0xfffffffec6770800 0xffffff003d4b0c80 -1 (idle)

    6 @ 0xfffffffec6770000 0xffffff003d53bc80 -1 (idle)

    7 @ 0xfffffffec6762000 0xffffff003d58fc80 -1 (idle)

    part thread pri PID cmd

    0 @ 0xfffffffffbc4eef0

  • The "zfs" command can show us the pool(s), their configuration, read/write/checksum/error stats, and even ARC stats.

    SolarisCAT(vmcore.0/11X)> zfs -e

    ZFS spa @ 0xfffffffec6c21540

    Pool name: zones

    State: ACTIVE

    VDEV Address State Aux Description

    0xfffffffec0a9e040 FAULTED - root

    READ WRITE FREE CLAIM IOCTL

    OPS 0 0 0 0 0

    BYTES 0 0 0 0 0

    EREAD 0

    EWRITE 0

    ECKSUM 0

    VDEV Address State Aux Description

    0xfffffffec0a9eac0 FAULTED - /dev/dsk/c0t1d0s0

    READ WRITE FREE CLAIM IOCTL

    OPS 74356305 578263155 0 0 0

    BYTES 757G 10.4T 0 0 0

    EREAD 0

    EWRITE 0

    ECKSUM 0

    SolarisCAT(vmcore.0/11X)> zfs arc

    ARC (Adaptive Replacement Cache) Stats:

    hits 77708247444

    misses 1930348

    demand_data_hits 74303514929

    demand_data_misses 1325511

    demand_metadata_hits 620388795

    demand_metadata_misses 160708

    prefetch_data_hits 1361651307

    ....

  • SOLARIS CRASH ANALYSIS TOOL FULL COMMAND LIST

    Apart from the above mentioned commands, there are various commands available for doing in depth

    analysis . They are

    Initial Investigation:

    analyze coreinfo msgbuf panic stack stat toolinfo

    General Commands:

    analyze clockinfo coreinfo cyclic demangle eckstat environ exit help intr kstat modinfo

    msgbuf stat symbols taskq ttrace tunables

    Memory Dump/Display Commands:

    bigdump buf flip kseg mdump nvlist pdump pkma r rd rd16 rd32 rd64 rd8 rdb rdc

    rdd rdf rdh rdi rdl rdq rds rdw sdump seg skma strsum vmem wr

    Data Conversion Commands:

    2base 2dec 2double 2float 2hex 2string 2time bits calc decode deltatime pid2paddr size

    Process/Thread Information:

    classtbl panic pid2paddr proc sig thread tlist upcount

    Stacks/Traps:

    findaddr_stack frame regs stack trap

    Searching Memory/Core:

    findport findval stack tlist

    File and Filesystem Information:

    autofs bigdump buf cgcheck dnlc findfiles getpath inode nfs node tmpfs vfssw vfstype

    CPU and Dispatch Queues:

    callout cpu dispq disptbl

    Memory Information:

    anon iommu ipc kma kseg kstat map memerr meminfo page pkma resvswap seg

    sfmmu skma swapinfo vmem whatis xmmu

    Network/IPC:

    findport ifconf ipc ndd nfs nstat pdump pkma stream

    Type Database Commands:

    ctf sarray savl sdump shash skma slist slistt stree stype typedb

  • Lock Display Commands:

    cv lck mutex rwlock sema sleepq tstile

    Device Information Commands:

    dev qlcfc

    System Files/Tunables:

    etcsystem msgbuf name2major path2inst tunables vfstab

    Disassembler Commands:

    codepath dis rdi sim

    Data Structure Consistency Checks:

    anon callout cgcheck dispq dnlc kma seg sfmmu symbols upcount vmem

    Miscellaneous Commands:

    clust door legal namelist pool project

    pty rctl sanitize svm task toolinfo trans zfs zone

    Tool Configuration:

    base color refclock scat_version scatenv write

    I hope this document helps you get an idea of how easy it is to really dig deeply into your core dumps

    using Solaris CAT & mdb. The Solaris CAT is a powerful and robust tool to use & it will help us a lot in

    providing the RCA by identifying the reason for sudden panic reboot of the server.