43
Copyright 2013 FUJITSU LIMITED kdump: Improve reliability on kernel switching May 30th, 2013 Takao Indoh Fujitsu Limited LinuxCon Japan 2013

kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

kdump: Improve reliability on

kernel switching

May 30th, 2013

Takao Indoh

Fujitsu Limited

LinuxCon Japan 2013

Page 2: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Fujitsu has been working for kdump to

improve its reliability

Now kdump is very reliable, but still has

problems

Recently IOMMU becomes important for KVM

• PCI passthrough, SR-IOV

Kdump fails if IOMMU is enabled

Need to eradicate remaining problems of

kdump

Background

1

Page 3: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Theme: How we can improve kdump

reliability?

Why does kdump fail?

How can we fix it?

Theme of this presentation

2

Page 4: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Technical terms

Overview of kdump

Why does kdump fail?

Solution

Implementation

Summary

Table of Contents

3

Page 5: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Technical terms

4

Page 6: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

High performance, general purpose I/O

interconnect [01]

PCI express

Root Complex

Endpoint

Upstream Port

Downstream Port

Switch

Root Port CPU

Memory

5

Page 7: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

physical address

Copyright 2013 FUJITSU LIMITED

input/output memory management unit

Convert device address to physical address

on DMA.

IOMMU

Memory

IOMMU MMU

Device CPU

Virtual DMA

address Virtual memory

address

6

Page 8: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Intel VT-d feature

The interrupt-remapping architecture

enables system software to control and

censor external interrupt requests [02]

Interrupt remapping

Interrupt Remapping

table

Deliver to appropriate

CPU

7

Page 9: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Overview of kdump

8

Page 10: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Memory

1st kernel

2nd kernel

Storage

Copyright 2013 FUJITSU LIMITED

Kdump - Kexec-based Kernel Crash

Dumping Mechanism

What is kdump?

Panic

Switch

Save

1. 1st kernel boots up

2. Panic in 1st kernel

3. Switch to 2nd kernel

4. 2nd kernel boots up

5. Save memory to the

storage

9

Page 11: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

More reliable than traditional crash dump

Traditional dump

• Works in insane kernel

• Dumping in panic situation is not safe

Kdump

• Works in newly booted sane kernel

• Dumping in almost clean situation is safe!

But there are some cases that kdump fails,

why?

Advantage of kdump

10

Page 12: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Why does kdump fail?

11

Page 13: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Human error

wrong configuration

disk to save memory is full

etc.

Problem on kernel switching

Kdump is disturbed by DMA/interrupt from

device

Reasons of kdump failure

12

Page 14: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Problem on kernel switching

1st kernel

Device

DMA

/interrupt

2nd kernel

Device

? ? ?

1. Device is working in first kernel

2. Device is still working after kernel switching,

DMA/interrupt continues

3. Second kernel cannot handle it, it causes

problem

13

Page 15: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Interrupt remapping error

In normal case, interrupt is delivered correctly

based on remapping table

Example(1)

1st kernel Device

Remapping

table

(3) Deliver (2) Interrupt

(1) I/O req.

14

Page 16: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Interrupt remapping error

Remapping fault because table was cleared

On certain platform system hangs up due to SMI

Fixed by PCI quirk on Intel chipset

Example(1)

1st kernel Device

Remapping

table

(4) Interrupt

(1) I/O req.

2nd kernel

(2) Panic

(3) Clear Error

15

Page 17: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

IOMMU error

Almost same as interrupt remapping error

1. Table for IOMMU to convert address is cleared

on second kernel boot

2. DMA by devices causes IOMMU error

3. Driver error/PCI SERR

4. Driver does not work correctly, kdump fails

Not fixed yet, I’m working on this.

Example(2)

16

Page 18: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Theme: How we can improve kdump

reliability?

Why does kdump fail?

• Devices continue working after switching to second

kernel

How can we fix it?

Back to Theme

17

Page 19: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Theme: How we can improve kdump

reliability?

Why does kdump fail?

• Devices continue working after switching to second

kernel

How can we fix it?

Back to Theme

18

Page 20: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Theme: How we can improve kdump

reliability?

Why does kdump fail?

• Devices continue working after switching to second

kernel

How can we fix it?

• Stop all devices on second kernel boot

Back to Theme

19

Page 21: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Solution

20

Page 22: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Clear bus master bit in PCI config reg.

Result: DMA stopped, but IOMMU error after

driver loading

Driver set this bit again to enable DMA

PCIe function level reset

Result: Same as above

This is optional, may not be supported

PCIe hot reset

Result: Works!

Stop devices - How?

21

Page 23: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

A reset propagated in-band across a Link

using a Physical Layer mechanism [01]

How to trigger hot reset

Setting the Secondary Bus Reset bit of the

Bridge Control register in downstream port

PCIe hot reset

Endpoint

PCIe switch Downstream port

Set bit

Reset

22

Page 24: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Reset all PCIe endpoints

Don’t reset:

Display devices

• A monitor blacks out when its controller is reset

Legacy PCI devices

• To simplify algorithm

• Recently not used

Stop devices – Which one?

VGA

PCI

23

Page 25: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Stop devices - When?

1st

kernel

2nd

kernel

PCI

init IOMMU

init

Driver

init Panic

Time

Need to stop devices

before iommu init and

driver init

24

Page 26: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Stop devices - When?

PCI

init IOMMU

init

Driver

init

1st

kernel

2nd

kernel

Panic

Time

Stop before jumping into

second kernel

Not safe

25

Page 27: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Stop devices - When?

PCI

init IOMMU

init

Driver

init

1st

kernel

2nd

kernel

Panic

Time

Stop as soon as second

kernel starts

No pci_dev struct , difficult to

implement

MMCONFIG is not ready

26

Page 28: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Stop devices - When?

PCI

init IOMMU

init

Driver

init

1st

kernel

2nd

kernel

Panic

Time

Stop after PCI initialization

Good!

27

Page 29: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Another idea: Reset devices when error is

detected

Reset devices in IOMMU error handler

Reset devices in AER error handler

Result: driver does not work correctly

because device is reset during driver loading

Stop devices - When?

28

Page 30: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Reset devices by PCIe hot reset

Reset all PCIe endpoints except display

devices and legacy PCI devices

Reset after PCI initialization

Summary of solution

29

Page 31: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Implementation

30

Page 32: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Add new initcall handler

Type Handler Work

arch_initcall pci_arch_init Setup MMCONFIG

subsys_initcall acpi_init Setup PCI

fs_initcall_sync reset_pcie_endpoints Reset devices

rootfs_initcall pci_iommu_init Setup iommu

device_initcall * Setup driver

Initcall is called in this order on boot

New fs_initcall_sync handler is added to

reset devices

31

Page 33: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

reset_pcie_endpoints

1. Find root port/downstream

port whose child is

endpoint except display

device

2. Save PCI config regs of

endpoint

3. Bus reset on the port

4. Restore PCI config regs of

endpoint

Switch

Root Complex

VGA

HBA(endpoint)

32

Page 34: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

reset_pcie_endpoints

Root Complex 1. Find root port/downstream

port whose child is

endpoint except display

device

2. Save PCI config regs of

endpoint

3. Bus reset on the port

4. Restore PCI config regs of

endpoint

VGA

33

Page 35: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

reset_pcie_endpoints

Root Complex 1. Find root port/downstream

port whose child is

endpoint except display

device

2. Save PCI config regs of

endpoint

3. Bus reset on the port

4. Restore PCI config regs of

endpoint

VGA

34

Page 36: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

reset_pcie_endpoints

Root Complex 1. Find root port/downstream

port whose child is

endpoint except display

device

2. Save PCI config regs of

endpoint

3. Bus reset on the port

4. Restore PCI config regs of

endpoint

VGA

35

Page 37: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

reset_pcie_endpoints

Root Complex 1. Find root port/downstream

port whose child is

endpoint except display

device

2. Save PCI config regs of

endpoint

3. Bus reset on the port

4. Restore PCI config regs of

endpoint

VGA

36

Page 38: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Patch posted

v1 https://patchwork.kernel.org/patch/2482291/

v2 https://patchwork.kernel.org/patch/2562841/

Under discussion

Please join in discussion!

Please test this patch!

Upstream status

37

Page 39: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Summary

38

Page 40: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

Kdump sometimes fails because devices are

working on kernel switching

Stopping devices is needed to improve

kdump reliability

Resetting devices by PCIe hot reset after

PCI initialization is effective to fix this

problem

Posted a patch to upstream, still under

discussion

Summary

39

Page 41: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

More testing

HP engineer reported that kernel hangs up after

device reset on certain platform [03]

Improve reset timing

Still unstable window

TODO

PCI

init IOMMU

init

Driver

init Panic

Reset

40

Page 42: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2013 FUJITSU LIMITED

01

PCI Express® Base Specification Revision 3.0

02

Intel® Virtualization Technology for Directed I/O

03

RE: [PATCH] Reset PCIe devices to stop

ongoing DMA

https://lkml.org/lkml/2013/4/30/211

References

PCI Express, PCIe, and PCI are trademarks of PCI-SIG.

Intel is trademarks or registered trademarks of Intel Corporation

41

Page 43: kdump: Improve reliability on kernel switching · More reliable than traditional crash dump Traditional dump •Works in insane kernel •Dumping in panic situation is not safe Kdump

Copyright 2010 FUJITSU LIMITED 42