42
Unikernels: the Rise of the Library Hypervisor Anil Madhavapeddy, @avsm Mindy Preston, @yomimono Martin Lucina +the MirageOS and Docker for Mac/Win teams Docker Inc, @docker with contributions from IBM Docker Distributed Systems Summit 7th October 2016, Berlin, Germany

Unikernels: Rise of the Library Hypervisor

Embed Size (px)

Citation preview

Page 1: Unikernels: Rise of the Library Hypervisor

Unikernels: the Rise of the Library Hypervisor

Anil Madhavapeddy, @avsm Mindy Preston, @yomimono

Martin Lucina +the MirageOS and Docker for Mac/Win teams

Docker Inc, @docker with contributions from IBM

Docker Distributed Systems Summit 7th October 2016, Berlin, Germany

Page 2: Unikernels: Rise of the Library Hypervisor

Conventional hypervisors• Run full guest operating

systems with complex emulation needs.

• Scaffolding for device emulation, instruction emulation, etc.

• Hard to compose into existing infrastructure without wrapping a full hypervisor layer.

Xen Hypervisor

qemu

xenstored

xenconsoled

Hardware

Dom0DomU

Page 3: Unikernels: Rise of the Library Hypervisor

Conventional hypervisorsCVE-2016-3710: VGA emulation missing bounds checks causes exploit.

CVE-2016-5403: unbounded virtio memory usage causes DoS.

CVE-2016-3672: unrestricted qemu logging causes DoS.

CVE-2015-8554: qemu-dm buffer overrun in MSI-X causes exploit.

CVE-2015-7504: heap overflow in pcnet emulator causes exploit.

• Run full guest operating systems with complex emulation needs.

• Scaffolding for device emulation, instruction emulation, etc.

• Hard to compose into existing infrastructure without wrapping a full hypervisor layer.

Page 4: Unikernels: Rise of the Library Hypervisor

How can distributed systems use hardware protection more

flexibly and composably?

Page 5: Unikernels: Rise of the Library Hypervisor

Recap: Unikernels

• "library operating systems" break kernels into libraries.

• Link libraries with a boot layer, scheduler and application.

• Portable microservices that boot directly on hypervisors or Unix. Xen

Hardware

App

Linux

Hardware

DockerApp

Configuration Business Logic

HTTP JSON SSL

TCP/IP Xen Devices

Unix libev

Unix musl libc

Application

Libraries

Libraries

Page 6: Unikernels: Rise of the Library Hypervisor

Recap: Unikernels

• Many benefits are lost when deploying on existing clouds.

• Tiny binaries (200k) still require scaffolding of a full OS to boot.

• Difficult to manage hypervisor from inside a container as full host privilege is needed.

• "library operating systems" break kernels into libraries.

• Link libraries with a boot layer, scheduler and application.

• Portable microservices that boot directly on hypervisors or Unix.

Page 7: Unikernels: Rise of the Library Hypervisor

Library Hypervisors• Extend the "kit" model and break down hypervisor

functionality into libraries.

• Expose core functionality (CPU and memory) as library, and other pieces (device emulation) are optional.

• Benefit: huge reduction in TCB, and better fit to container-native infrastructure with privilege dropping.

• Drawback: no existing support in operating systems.

Page 8: Unikernels: Rise of the Library Hypervisor

Library Hypervisors• Extend the "kit" model and break down hypervisor

functionality into libraries.

• Expose core functionality (CPU and memory) as library, and other pieces (device emulation) are optional.

• Benefit: huge reduction in TCB, and better fit to container-native infrastructure with privilege dropping.

• Drawback: no existing support in operating systems.

But let's a closer look!

Page 9: Unikernels: Rise of the Library Hypervisor

What has changed?OSX

Hypervisor framework

FreeBSD bHyve

xHyveHyperKit

bhyve.org

xhyve.org

github.com/docker/hyperkit

Page 10: Unikernels: Rise of the Library Hypervisor

What has changed?OSX

Hypervisor framework

Linux /dev/kvm

FreeBSD bHyve

xHyveHyperKit kvmtool

novm

ukvm

Page 11: Unikernels: Rise of the Library Hypervisor

What has changed?OSX

Hypervisor framework

Linux /dev/kvm

FreeBSD bHyve

xHyveHyperKit kvmtool

novm

Docker for Mac MirageOS3

ukvm

Page 12: Unikernels: Rise of the Library Hypervisor

• Easy drag and drop installation, and autoupdates to get latest Docker.

• Secure, sandboxed virtualisation architecture without elevated privileges.

• Native networking support, with VPN and network sharing compatibility.

• File sharing between container and host: uid mapping, inotify events, etc.

Docker for MacAiming for a native OSX experience that works with existing developer workflows.

Page 13: Unikernels: Rise of the Library Hypervisor

• Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve.

• Sandbox friendly: processes largely run as non-root, with privileges of the local user.

Virtualisation

Page 14: Unikernels: Rise of the Library Hypervisor

• Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve.

• Sandbox friendly: processes largely run as non-root, with privileges of the local user.

Virtualisation

OSX Kernel

Hypervisor.framework

Hardware virt: VMX,

nested paging

Page 15: Unikernels: Rise of the Library Hypervisor

• Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve.

• Sandbox friendly: processes largely run as non-root, with privileges of the local user.

Virtualisation

OSX Kernel Userspace

Hypervisor.framework

User Process

Thread/vCPUTraps on I/O pagesManages ACPI, PCI devices

Hardware virt: VMX,

nested paging

Page 16: Unikernels: Rise of the Library Hypervisor

• Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve.

• Sandbox friendly: processes largely run as non-root, with privileges of the local user.

Virtualisation

OSX Kernel Userspace

Hypervisor.framework

User ProcessHardware virt: VMX,

nested paging

ProcessLinux Kernel

VirtIO IPCVirtIO BlockVirtIO Net

Alpine Linux Userspace

Latest Docker preconfigured

QCow2VPNKit

Logs redirected to OSX host

Page 17: Unikernels: Rise of the Library Hypervisor

• Uses the new HyperKit framework, which is in turn based on xHyve and FreeBSD's bHyve.

• Embeds Linux: includes an embedded lightweight Alpine Linux distribution optimised for fast boot and stateless operation for containers.

Virtualisation

$ docker info Containers: 358 Running: 13 Paused: 0 Stopped: 345 Images: 485 Server Version: 1.11.1 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirperm1 Supported: true

Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge null host Kernel Version: 4.4.9-moby Operating System: Alpine Linux v3.3 OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 3.858 GiB

Page 18: Unikernels: Rise of the Library Hypervisor

HyperKit library structure

• In HyperKit, most functionality is linked as a library.

• If app doesn't need a protocol, it is not linked and not part of the trusted computing base.

Page 19: Unikernels: Rise of the Library Hypervisor

• Want to hide the gory details of virtualisation from the user. The Linux VM should be "invisible".

• Not solving this leads to many user complaints:

• VPN software and corporate installations do not like bridged virtual machines or custom routing.Result: container traffic cannot connect to Internet.

• Services cannot be exposed on localhost or the external interface and are instead on the Linux VM IP address.Result: breaks common web oAuth workflows.

Networking

Page 20: Unikernels: Rise of the Library Hypervisor

Networking

OSX Kernel UserspaceHypervisor.framework

HyperKitHardware virt: VMX,

nested paging

VirtIO IPC

VirtIO Block

VirtIO Net

Page 21: Unikernels: Rise of the Library Hypervisor

Networking

OSX Kernel UserspaceHypervisor.framework

HyperKitHardware virt: VMX,

nested paging

VirtIO IPC

VirtIO Block

VirtIO NetEthernet In

Containers! Containers! Containers!

Page 22: Unikernels: Rise of the Library Hypervisor

Networking

OSX Kernel UserspaceHypervisor.framework

HyperKitHardware virt: VMX,

nested paging

VirtIO IPC

VirtIO Block

VirtIO NetEthernet In

Bridge

EthernetKernel Module

Containers! Containers! Containers!

Page 23: Unikernels: Rise of the Library Hypervisor

• Want to hide the gory details of virtualisation from the user. The Linux VM should be "invisible".

• Not solving this leads to many user complaints:

• VPN software and corporate installations do not like bridged virtual machines or custom routing.Result: container traffic cannot connect to Internet.

• Services cannot be exposed on localhost or the external interface and are instead on the Linux VM IP address.Result: breaks common web oAuth workflows.

Networking

Page 24: Unikernels: Rise of the Library Hypervisor

• Challenge: Services publishing ports should be exposed on localhost without needing VM info.

• Solution: VPNKit forwards container port requests to a OSX service which binds them natively on its external interface.

• Benefits:

• docker run -P on the Mac now works without requiring any knowledge of the VM innards.

• External oAuth workflows operate with web apps.

Networking

Page 25: Unikernels: Rise of the Library Hypervisor

Networking

OSX Kernel UserspaceHypervisor.framework

HyperKitHardware virt: VMX,

nested paging

VirtIO IPC

VirtIO Block

VirtIO NetEthernet In

Bridge

EthernetKernel Module

Containers! Containers! Containers!

Page 26: Unikernels: Rise of the Library Hypervisor

Networking

OSX Kernel UserspaceHypervisor.framework

HyperKitHardware virt: VMX,

nested paging

VirtIO IPC

VirtIO Block

VirtIO NetEthernet In

VPNKitMirageOS

TCP/IP

DNS

SocketerKernel Sockets

Containers! Containers! Containers!

github.com/docker/vpnkit

Page 27: Unikernels: Rise of the Library Hypervisor

• Challenge: Deal with custom VPN software on the host that makes it difficult to bridge.

• Solution: VPNKit, efficiently reconstructs container traffic into separate TCP/IP flows and translates them into native OSX/Windows sockets.

• Benefits:

• All network traffic is generated from normal socket calls (e.g. gethostbyaddr) on the Mac, so interacts well with firewalls, VPNs, and any local security policies.

Networking

Page 28: Unikernels: Rise of the Library Hypervisor

•Native OSX application, uses HyperKit to virtualise for domain-specific purpose ("docker run")

•Links MirageOS unikernel libraries for networking and storage translation between OS boundaries.

•The library approach let us glue together these components really easily.

•Docker for Mac is quite a complex distributed system internally, but (hopefully) hidden from user.

Docker for Mac + unikernels

Page 29: Unikernels: Rise of the Library Hypervisor

MirageOS 3 + Solo5

•Unikernels have been gathering pace; next challenge is to make them easily deployable.

•Build handled via Docker, but docker run shouldn't need privileges (e.g. to start a VM).

•MirageOS 3 has a new library hypervisor for Linux, developed by IBM, Docker and Cambridge University contributors.

mirage.io

Page 30: Unikernels: Rise of the Library Hypervisor

MirageOS 3 + Solo5•Source: https://github.com/Solo5/solo5 •Runs as a Unix process and opens /dev/kvm for hardware isolation.

•ukvm is a small, modular monitor that links only what is needed. Can be 10k in size!

•Can run privilege separated: one process opens /dev/kvm and drops privileges and executes the unikernel.

•Boot times are the same as process fork times, since all the device setup is handled in-process.

Page 31: Unikernels: Rise of the Library Hypervisor

MirageOS 3 + Solo5

Source: Dan Williams and Ricardo Koller, IBM Research, HotCloud 16

Page 32: Unikernels: Rise of the Library Hypervisor

MirageOS 3 + Solo5

•Due for stable release in the next month. • Intended to be "unikernel template" for other projects to share hypervisor code.

•Liberally licensed under BSD/Apache2/ISC to encourage adoption and embedding.

•BoF and tutorials tomorrow to demonstrate it. Developers are all here and hacking!

Page 33: Unikernels: Rise of the Library Hypervisor

Demo!

Page 34: Unikernels: Rise of the Library Hypervisor

How can distributed systems use hardware protection more

flexibly and composably?

Page 35: Unikernels: Rise of the Library Hypervisor

Questions?

Download free at docker.com

Twitter: @avsm

https://github.com/docker/hyperkit

https://github.com/docker/vpnkit

https://github.com/docker/datakit

https://github.com/mirage/

We will be hacking

tomorrow!

Page 36: Unikernels: Rise of the Library Hypervisor

Backup Slides

Page 37: Unikernels: Rise of the Library Hypervisor

• Challenge: Share arbitrary OSX directory tree into Linux container without requiring extensive modification of either side.

• Solution: Use a FUSE forwarding layer and translate Linux filesystem calls to OSX equivalents.

OSX Host Linux Host ContainerVOLUMEcom.docker.osxfs

Track extra metadata

Translate to OSX filesystem calls

FUSE

Filesystem Sharing

Page 38: Unikernels: Rise of the Library Hypervisor

• Challenge: Need filesystem activation so events on the Mac wake up container servers and vice-versa.

• Solution: osxfs uses FSEvents API and injects inotify activation events into container.

OSX Host Linux Host ContainerVOLUMEcom.docker.osxfs

FSEvents watches open files

Events from Linux causes OSX apps

to wake up

FUSE

Filesystem Sharing

Page 39: Unikernels: Rise of the Library Hypervisor

• Challenge: Need filesystem activation so events on the Mac wake up container servers and vice-versa.

• Solution: osxfs uses FSEvents API and injects inotify activation events into container.

OSX Host Linux Host ContainerVOLUMEcom.docker.osxfs

FSEvents watches open files

Events from Linux causes OSX apps

to wake up

FUSE

Filesystem Sharing

Page 40: Unikernels: Rise of the Library Hypervisor

• Challenge: Deal with custom VPN software on the host that makes it difficult to bridge.

• Solution: VPNKit, efficiently reconstructs container traffic into separate TCP/IP flows and translates them into native OSX/Windows sockets.

OSX Host Linux Host ContainerRUN <...>com.docker.hyperkit-net

Reconstruct traffic

TCP flows

Translate to OSX socket calls

Ethernet bridge

DHCPv4

NTP

Networking

Page 41: Unikernels: Rise of the Library Hypervisor

OSX Host Linux Host

Privileged Port Service

Container

EXPOSEPort Service

VSock Binder

RUN <...>

VSock Listener

Userland Proxy

• Challenge: Services publishing ports should be exposed on localhost without needing VM info.

• Solution: VPNKit forwards container port requests to a OSX service which binds them natively on its external interface.

Networking

Page 42: Unikernels: Rise of the Library Hypervisor

$ docker run resin/armv7hf-debian uname -a

Linux 7ed2fca7a3f0 4.1.12 #1 SMP Tue Jan 12 10:51:00 UTC 2016 armv7l GNU/Linux

$ docker run justincormack/ppc64le-debian uname -a

Linux edd13885f316 4.1.12 #1 SMP Tue Jan 12 10:51:00 UTC 2016 ppc64le GNU/Linux

Multi-CPU architectures