Linux Kernel Crashdump

AnalyzingLinux kernelcrash dumps

Marian Marinov hackman @ irc.freenode.net https://github.com/hackman

Who am I?

What will I cover?

How to gather crash data

How to analyze crash dumps

Ways to gather crash data

Serial console, netconsole

Kmsg dumpers: ramoops, mtdoops

Kdump: core dump of the whole kernel

Pstore: persistent store filesystem

NVRAM: Non-Volatile RAM (in progress)

MCE: hardware errors

Gather the OOPS

Serial console it is not wide spread

it is limited to a several meters from the machine

Netconsole allows for sending oopses over the network

if compiled as a module, allows reconfiguration

relies on UDP

if the network is broken or the network stack is the one experiencing issues - IT DOES NOT WORK :)

Gather the OOPS

Serial console it is not wide spread

it is limited to a several meters from the machine

Netconsole allows for sending oopses over the network

if compiled as a module, allows reconfiguration

relies on UDP

if the network is broken or the network stack is the one experiencing issues - IT DOES NOT WORK :)

Pstore

Pstore: persistent store filesystem Relies on APEI or UEFIACPI Platform Error Interface(APEI)

Provides a generic FS layer for lower persistent storage

Relies on platform drivers

Available since 2010

# dmesg|grep persistentpstore: Registered erst as persistent store backend# mount -t pstore none /sys/fs/pstore

Pstore

# ls -l /sys/fs/pstoretotal 0-r--r--r--. 1 root root 1016 May 13 07:46 dmesg-efi-1-r--r--r--. 1 root root 1012 May 13 07:46 dmesg-efi-10-r--r--r--. 1 root root 948 May 13 07:46 dmesg-efi-11-r--r--r--. 1 root root 943 May 13 07:46 dmesg-efi-2-r--r--r--. 1 root root 677 May 13 07:46 dmesg-efi-3-r--r--r--. 1 root root 993 May 13 07:46 dmesg-efi-4-r--r--r--. 1 root root 1010 May 13 07:46 dmesg-efi-5-r--r--r--. 1 root root 999 May 13 07:46 dmesg-efi-6-r--r--r--. 1 root root 976 May 13 07:46 dmesg-efi-7-r--r--r--. 1 root root 1006 May 13 07:46 dmesg-efi-8-r--r--r--. 1 root root 949 May 13 07:46 dmesg-efi-9

Pstore

# cat dmesg-efi-4cat /sys/fs/pstore/dmesg-efi-4Panic#2 Part4[ 306.271891] IP: [] sysrq_handle_crash+0x16/0x20[ 306.271917] PGD 80a98c067 PUD 807e8e067 PMD 0[ 306.271937] Oops: 0002 [#1] SMP[ 306.271952] Modules linked in:tcp_lp rfcomm fuse xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ........

function that triggered the crash

Kmsg dumpers

ramoops utilizes the pstore for storing oopses and panics

since 2011

mtdoops utilizes Memory Technology Devices found on some SoC

available since 2007

NVRAM still not widely available

MCE - mainly EDAC Error Detection And Correction

Kdump

No dependencies, theoretically ideal, but... Based on kexec

Not all arch support kexec

Not easy to setup

Boots a second kernel to retrieve the crash vmcore Almost useless in cases of HW failure

Needs assistance of other tools for analysis

Kdump

Kdump

A second kernel needs to be started when crashing

Not all drivers work fine in the second kenrel

Very limited memory for the second kernel

We need to construct a new initrd for the second kernel

Analyzing the crashed kernel

general protection fault: 0000 [#2] SMP Modules linked in: module list hereCPU: 4 PID: 6839 Comm: iceweasel Tainted: G D 3.16-2-amd64 #1 Debian 3.16.3-2Hardware name: Gigabyte 990FXA-UD5, BIOS FB 01/23/2013task: ffff88009c063370 ti: ffff8801f7c94000 task.ti: ffff8801f7c94000RIP: 0010:[] [] __d_lookup_rcu+0xc8/0x160RSP: 0018:ffff8801f7c97cb0 EFLAGS: 00010212RAX: 0000000000000015 RBX: ffff8800984a2b60 RCX: 000000000000000cRDX: ffff0800984a2b90 RSI: ffff8801f7c97e10 RDI: 6461657262757065RBP: ffff8800984a2cd8 R08: ffff88009c19308c R09: ffff88009c19308cR10: 0000000000000015 R11: ffffffffffffffff R12: ffff8800984a2b58R13: 00000015067b0bda R14: ffff8801f7c97e10 R15: ffff8801f7c97d0cFS: 00007f4f52f7d740(0000) GS:ffff88023fd00000(0000) knlGS:00000000f55ffb40CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033CR2: 00007f4f37200018 CR3: 0000000206380000 CR4: 00000000000007e0


Stack: ffff88009c063370 ffff8801f4e60c10 ffff88009c063370 ffff8801f7c97d78 ffff8801f7c97d68 0000000000000041 ffff8802356b80a0 ffff8800984a2cd8 ffff8801f7c97e00 ffffffff811aedde 02ffff8000000000 0000000200000000Call Trace: [] ? lookup_fast+0x3e/0x2b0 [] ? path_lookupat+0x155/0x780 [] ? jfs_readdir+0x1ba/0xf90 [jfs] [] ? filename_lookup+0x26/0xc0 [] ? user_path_at_empty+0x54/0x90 [] ? from_kgid_munged+0xe/0x20 [] ? cp_new_stat+0x13a/0x160 [] ? vfs_fstatat+0x46/0x90 [] ? SYSC_newstat+0x1a/0x40 [] ? system_call_fast_compare_end+0x10/0x15Code: 6b 18 75 cf 41 89 07 4d 89 c8 48 8b 53 20 44 89 d0 eb 12 48 39 fe 75 bb 48 83 c2 08 49 83 c0 08 83 e8 08 74 26 49 8b 38 83 f8 07 8b 32 77 e3 8d 0c c5 00 00 00 00 4c 89 d8 48 31 fe 48 d3 e0 RIP [] __d_lookup_rcu+0xc8/0x160 RSP ---[ end trace d7e9304af4a09ee6 ]---


Try using ksymoops on the collected oops/panic System.map - kernel function addresses

/proc/ksyms - list of kernel symbols

/proc/kcore - the system memory

vmlinux - the uncompressed kernel, can be disassembled using objdump


# gdb namei.o(gdb) list *(lookup_fast+0x3e)0x48fe is in lookup_fast (fs/namei.c:1551).1546 * going to fall back to non-racy lookup.1547 */1548 if (nd->flags & LOOKUP_RCU) {1549 unsigned seq;1550 bool negative;1551 dentry = __d_lookup_rcu(parent, &nd->last, &seq);1552 if (unlikely(!dentry)) {1553 if (unlazy_walk(nd, NULL, 0))1554 return -ECHILD;1555 return 0;


# cd /usr/src/kernels/KERNEL# grep -r lookup_fast.....fs/namei.c:static int lookup_fast(struct nameidata *nd,.....

Using the crash utility

Download and build the latest version https://github.com/crash-utility/crash

Run it on the same architecture mix between 32 and 64bit arch is not supported

Checkout the help information http://people.redhat.com/anderson/help.html


Most used commands: bt - backtrace

log - print the kernel buffer

ps - list all processes

files - list all file descriptors related to task/PID

whatis - gives you data or type information


Usually you would manually examine the crash But a small automation may be nice:

# cat extract-basic-infobtlogpsexit# crash < extract-basic-info


Usually you would manually examine the crash But a small automation may be nice:

# cat extract-basic-infobtlogpsexit# crash vmlinux vmcore < extract-basic-info > report


RIP: 0010:[] [] __list_del_entry+0x0/0xb0RSP: 0018:ffff88002a4e3d20 EFLAGS: 00010006RAX: dead000000000100 RBX: dead000000000100 RCX: 0000000000000001RDX: 0000000000000101 RSI: 0000000000000001 RDI: dead000000000100RBP: ffff88006a6e6028 R08: 0000000000000101 R09: 0000000000000000R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006a6e6008R13: 0000000000000246 R14: deacffffffffff18 R15: ffff880036a22098FS: 00007f2970ff9700(0000) GS:ffff88006fa00000(0000) knlGS:0000000000000000CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033CR2: 00007efdc457c000 CR3: 0000000069a27000 CR4: 00000000000006f0


Code Segment(CS): 0010 if the right most bit is evenkernel space

if it is odduser space


[4005105.249407] Kernel panic - not syncing: Hard LOCKUP[4005105.249409] CPU: 16 PID: 18891 Comm: kworker/u96:0 Tainted: G O 4.4.14-clouder2 #1[4005105.249411] Workqueue: ipoib_wq ipoib_mcast_join_task [ib_ipoib][4005105.249412] 0000000000000000 ffff883fff285b10 ffffffff812f4269 ffffffff81a05545[4005105.249414] ffff883fff285ba0 ffff883fff285b90 ffffffff8112738d ffffffff00000008[4005105.249416] ffff883fff285ba0 ffff883fff285b38 0000000000000000 0000000000000046[4005105.249417] Call Trace:[4005105.249418] [] dump_stack+0x67/0x9e[4005105.249422] [] panic+0xc5/0x20b[4005105.249424] [] watchdog_overflow_callback+0xdd/0xe0[4005105.249426] [] __perf_event_overflow+0x88/0x250[4005105.249427] [] perf_event_overflow+0x14/0x20[4005105.249429] [] intel_pmu_handle_irq+0x1c8/0x430[4005105.249432] [] ? vunmap_page_range+0x1a6/0x310[4005105.249434] [] ? unmap_kernel_range_noflush+0x11/0x20[4005105.249436] [] ? ghes_copy_tofrom_phys+0x118/0x1e0[4005105.249437] [] ? native_apic_wait_icr_idle+0x1f/0x30[4005105.249439] [] ? arch_irq_work_raise+0x35/0x40[4005105.249441] [] perf_event_nmi_handler+0x28/0x50[4005105.249443] [] nmi_handle+0x6d/0x140[4005105.249445] [] default_do_nmi+0x40/0x100[4005105.249446] [] do_nmi+0x101/0x150[4005105.249447] [] end_repeat_nmi+0x1a/0x1e[4005105.249450] [] ? ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]


[4005105.249407] Kernel panic - not syncing: Hard LOCKUP[4005105.249409] CPU: 16 PID: 18891 Comm: kworker/u96:0 Tainted: G O 4.4.14-clouder2 #1[4005105.249411] Workqueue: ipoib_wq ipoib_mcast_join_task [ib_ipoib][4005105.249412] 0000000000000000 ffff883fff285b10 ffffffff812f4269 ffffffff81a05545[4005105.249414] ffff883fff285ba0 ffff883fff285b90 ffffffff8112738d ffffffff00000008[4005105.249416] ffff883fff285ba0 ffff883fff285b38 0000000000000000 0000000000000046[4005105.249417] Call Trace:[4005105.249418] [] dump_stack+0x67/0x9e[4005105.249422] [] panic+0xc5/0x20b[4005105.249424] [] watchdog_overflow_callback+0xdd/0xe0[4005105.249426] [] __perf_event_overflow+0x88/0x250[4005105.249427] [] perf_event_overflow+0x14/0x20[4005105.249429] [] intel_pmu_handle_irq+0x1c8/0x430[4005105.249432] [] ? vunmap_page_range+0x1a6/0x310[4005105.249434] [] ? unmap_kernel_range_noflush+0x11/0x20[4005105.249436] [] ? ghes_copy_tofrom_phys+0x118/0x1e0[4005105.249437] [] ? native_apic_wait_icr_idle+0x1f/0x30[4005105.249439] [] ? arch_irq_work_raise+0x35/0x40[4005105.249441] [] perf_event_nmi_handler+0x28/0x50[4005105.249443] [] nmi_handle+0x6d/0x140[4005105.249445] [] default_do_nmi+0x40/0x100[4005105.249446] [] do_nmi+0x101/0x150[4005105.249447] [] end_repeat_nmi+0x1a/0x1e[4005105.249450] [] ? ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]


[4005105.249417] Call Trace:[4005105.249418] [] dump_stack+0x67/0x9e[4005105.249422] [] panic+0xc5/0x20b[4005105.249424] [] watchdog_overflow_callback+0xdd/0xe0[4005105.249426] [] __perf_event_overflow+0x88/0x250[4005105.249427] [] perf_event_overflow+0x14/0x20[4005105.249429] [] intel_pmu_handle_irq+0x1c8/0x430[4005105.249432] [] ? vunmap_page_range+0x1a6/0x310[4005105.249434] [] ? unmap_kernel_range_noflush+0x11/0x20[4005105.249436] [] ? ghes_copy_tofrom_phys+0x118/0x1e0[4005105.249437] [] ? native_apic_wait_icr_idle+0x1f/0x30[4005105.249439] [] ? arch_irq_work_raise+0x35/0x40[4005105.249441] [] perf_event_nmi_handler+0x28/0x50[4005105.249443] [] nmi_handle+0x6d/0x140[4005105.249445] [] default_do_nmi+0x40/0x100[4005105.249446] [] do_nmi+0x101/0x150[4005105.249447] [] end_repeat_nmi+0x1a/0x1e[4005105.249450] [] ? ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]


[4005105.249450] [] ? ipoib_mcast_join_task+0x14c/0x330 [ib_ipoib]

crash> list *(ipoib_mcast_join_task+0x14c)list: invalid argument: *(ipoib_mcast_join_task+0x14c)crash>

# grep -r ipoib_mcast_join_task drivers/infiniband/ulp/ipoib/ipoib_multicast.c:void ipoib_mcast_join_task(struct work_struct *work)


# grep -r ipoib_mcast_join_task drivers/infiniband/ulp/ipoib/ipoib_multicast.c:void ipoib_mcast_join_task(struct work_struct *work)# gdb drivers/infiniband/ulp/ipoib/ipoib_multicast.o(gdb) list *(ipoib_mcast_join_task+0x14c)0xffc is in ipoib_mcast_join_task (drivers/infiniband/ulp/ipoib/ipoib_multicast.c:641)636 }637 } else if (!delay_until ||638 time_before(mcast->delay_until, delay_until))639 delay_until = mcast->delay_until;640 }641 }642643 mcast = NULL;644 ipoib_dbg_mcast(priv, "successfully started all multicast joins\n");645


# crash vmlinux vmcorecrash> psPID PPID CPU TASK ST %MEM VSZ RSS COMM....5093 1 4 ffff880856d30c80 IN 0.0 36276 6340 hald....crash> whatis ffff880856d30c80unsigned longcrash> p ffff880856d30c80$1 = 18446612168130628736


# crash vmlinux vmcorecrash> psPID PPID CPU TASK ST %MEM VSZ RSS COMM....5093 1 4 ffff880856d30c80 IN 0.0 36276 6340 hald....crash> whatis ffff880856d30c80unsigned longcrash> p ffff880856d30c80$1 = 18446612168130628736


crash> task 5093PID: 5093 TASK: ffff880856d30c80 CPU: 4 COMMAND: "hald"struct task_struct { state = 1, stack = 0xffff8808429b0000, flags = 4211008, ptrace = 0, real_cred = 0xffff880845a2ec00, cred = 0xffff880845a2ec00, on_cpu = 0, prio = 120, static_prio = 120, normal_prio = 120,


crash> task 5093PID: 5093 TASK: ffff880856d30c80 CPU: 4 COMMAND: "hald"struct task_struct { state = 1, stack = 0xffff8808429b0000, flags = 4211008, ptrace = 0, real_cred = 0xffff880845a2ec00, cred = 0xffff880845a2ec00, on_cpu = 0, prio = 120, static_prio = 120, normal_prio = 120,


crash> whatis task_struct.credstruct task_struct { [1456] const struct cred *cred;}crash> whatis struct credstruct cred { kuid_t uid; kuid_t suid; kuid_t euid; kuid_t fsuid; kernel_cap_t cap_inheritable; kernel_cap_t cap_permitted; kernel_cap_t cap_effective; struct user_struct *user; struct user_namespace *user_ns; struct group_info *group_info;


crash> struct cred 0xffff880845a2ec00struct cred { usage = { counter = 48 }, uid = { val = 1849 }, gid = { val = 1845 }, suid = { val = 1849 }, sgid = { val = 1845 },


Most used commands: sys - show the system information stored in the crash

ipcs - show the shared memory segments

vm - examine the virtual memory in the crash dump

dev - list all devices

Marian Marinov hackman @ irc.freenode.net https://github.com/hackman

Engineering

Linux Kernel Crashdump