18
PVHVM Linux guest why doesn't kexec work? Vitaly Kuznetsov Red Hat Xen Developer Summit, 2015

Seattle2015 xen

Embed Size (px)

Citation preview

PVHVM Linux guest

why doesn't kexec work?

Vitaly Kuznetsov

Red HatXen Developer Summit, 2015

2 PVHVM Linux guest: why doesn't kexec work?

Why?

● We support Red Hat Enterprise Linux.

● Bare hardware, virtualized and cloud environments, ...

● Kernel issues happen.

● Analyse stack traces.

● In complicated cases use kdump!

3 PVHVM Linux guest: why doesn't kexec work?

Kexec/kdump

● “kexec … is a mechanism of the Linux kernel that allows "live" booting of a new kernel "over" the currently running kernel”

● Kdump uses kexec:● Some memory is reserved at boot (crashkernel=)● Crash kernel/initrd are loaded to the area.● On crash we trigger crash kernel's boot.● Crash initrd dumps all domain's memory and reboots.● You have crash file to analyse! (profit!!!)

Doesn't work for Xen guests

5 PVHVM Linux guest: why doesn't kexec work?

Issues with Kexec on PVHVM

● Previously used structures cause problems, no good way to transfer knowledge to kexec kernel.

● and we need these interfaces working!● Xen/guest interfaces we need to re-establish:

● shared_info frame (XENMAPSPACE_shared_info)● VCPU_info (VCPUOP_register_vcpu_info)● Event channels (EVTCHNOP_bind_*, ABI)

● + Emuirq/pirq mappings (PHYSDEVOP_map_pirq)

● Granted pages

6 PVHVM Linux guest: why doesn't kexec work?

shared_info page:

● 4k page, belongs to Xen hypervisor.

● Required for events, vcpu_info for first 32 VCPUs lives here.

● Upon boot guest chooses one of its pages to sacrifice.● XENMEM_add_to_physmap(XENMAPSPACE_shared_info)

frees guest's frame and mounts shared_info there.

● kexec kernel does the same for another frame → we get a hole as shared_info is being unmapped from its previous place.

7 PVHVM Linux guest: why doesn't kexec work?

Event channels:

● Already bound event channels● “(XEN) event_channel.c:370:d2v0 EVTCHNOP failure: error -17”

● 2 level → FIFO ABI switch at boot

● Mapped control block, event array pages.● Some INTERDOMAIN channels are being set up by

the toolstack:

● Xenstore, xenconsole,..● EVTCHNOP_reset resets everything, there is no

way back.

8 PVHVM Linux guest: why doesn't kexec work?

Grant pages:

● Memory sharing mechanism in Xen.

● We can't do anything guest-side:

● Forcibly unmapping a page from backend domain will crash it.

● Requesting new pages requires additional memory.● Some grants are “persistent”.

● Maybe not-an-issue for kdump because its memory region is separated but

● We still need functional backends for kexec kernel!

Possible solutions

10 PVHVM Linux guest: why doesn't kexec work?

“Obvious solution”

● Implement set of hypercalls to tear all interfaces down:

● reset_vcpu_info● evtchn_switch_to_2l● unmap_shared_info● do_something_with_granted_pages● …

● Good from “if there is a way to set something up there should be one to tear it down” PoV.

● Good for hypervisor testing :-)

11 PVHVM Linux guest: why doesn't kexec work?

“Obvious solution”

● Issues:

● Domain needs to follow a special protocol – what if it doesn't?

● Granted pages story is complicated.● Not all bits are being set up by the domain.● Too many possible issues (including security).

12 PVHVM Linux guest: why doesn't kexec work?

“New domain with the same memory”

● Destroy the original domain leaving its memory intact.

● Create new domain, reassign all memory pages, copy vcpu contexts.

● Benefits:

● No cumbersome teardown required!● Migration path is being reused!● Supportability: new interfaces/objects should “just

work”.

13 PVHVM Linux guest: why doesn't kexec work?

“New domain with the same memory”

● Issues:

● Memory reassignment appears to be cumbersome :-(

● Superpages, PoD, mem_access issues.● No m2p on ARM.

● Non-trivial toolstack part repeating migration code.● Too complicated.

14 PVHVM Linux guest: why doesn't kexec work?

“Reset everything”

● No cumbersome memory reassignment.

● Explicit list of interfaces to reset with one hypercall:

● shared_info, vcpu_info, event channels, pirq_to_emuirq, ioreq servers.

● Toolstack involvement required:

● Restart device model.● Reopen xenstore/xenconsole event channels.● ..

● Hypervisor maintainers like it :-)

15 PVHVM Linux guest: why doesn't kexec work?

“Reset everything”

● Granted pages - let's do (almost) nothing!

● Remove the domain from xenstore and add it back – all backends are supposed to release all mappings.

● Xenconsoled doesn't release its mapping (but that's fine).

● Special debug print to find future issues.● Hunt for misbehaving backends! (if there are such)

Current status andfuture work

17 PVHVM Linux guest: why doesn't kexec work?

Current status and future work

● [PATCH v10 00/11] “toolstack-assisted approach to PVHVM guest kexec” is out waiting for reviewers!

● … and testers too!● PVH (as "HVM without device model") should "just

work".● Not tested, minor issues are possible.

● ARM-specific part is -ENOSYS stub for now.● shared_info page needs handling (same as x86).● Some GIC cleanup?

Thank you!Questions?

Vitaly [email protected]