Data Center Virtualization and Your Future OpenShift ... · Kubernetes, Virtualization and Your...

Preview:

Citation preview

KubeVirtKubernetes, Virtualization and Your Future Data Center

KubeVirtKubernetes, Virtualization and Your Future Data Center

KubeVirt - Kubernetes, Virtualization and Your Future Data CenterItamar Heim, Sr Director & Fabian Deutsch, As. ManagerRed HatOpenShift Commons Briefing, August 24 2017

KubeVirt

● Upstream research project● Converged kubernetes infrastructure● Containers and virtual machines● Still early days, but interesting concepts

http://kubevirt.io/

Background

● KVM - a VM is just a user process● oVirt - Open Source Enterprise Virtualization● OpenStack - Infrastructure-as-a-Service (IaaS) Cloud● Kubernetes - Deployment, scaling, and management of containers

So, If...

● VMs are just user processes○ VMs and containers already share some isolation technologies - selinux, cgroups

● Kubernetes manages clustered containers, which are user processes● Can we get to a converged infrastructure?

Why Converged Infrastructure?

● Environments will co-exist over time○ While many new workloads will move to containers, virtualization will remain for the

foreseeable future. Same goes to on-premise vs. public cloud

● Unified infrastructure will (should) be easier to maintain, operate and reduce costs

● Migrating workloads from VMs to Containers is on same infrastructure.○ Can also benefit from local affinity between VM and container workloads

● VMs can benefit from advanced Kubernetes concepts (load balancing, rolling deployment, etc.)

● Enhances Kubernetes on-premise and bare metal use cases

Use Cases

● Run a container workload in a VM○ Better isolation

● Virtualization as in oVirt, OpenStack, etc.○ Leverage Kubernetes

○ Run a full fledged/featured VM

Not So Fast...

Goals

● Feature complete virtualization API (without contradictions and container

workload related limits)○ … for consumption by higher layers (i.e. UI, automation, SDK)

● Well behaving citizen on Kubernetes (technically and community wise)

● Production stable on all levels (runtime up to public API)

● With a native look and feel

Research focus

● What virtualization API is needed?

● Where should the runtime (libvirt/KVM) reside and how should it work?

● How should the integration into Kubernetes look?

● Kubernetes Gaps

Prior art

● virtlet○ Pod API to run VMs

● runv○ Pod API to run pods inside VMs for isolation

● Clear Containers / oci-cc-runtime○ Pod API to run pods inside VMs for isolation

API

● Virtlet, runv, ClearContainers derive VM from the pod spec○ Allows to create VMs for isolation○ Is getting cumbersome if it’s about creating ABI-stable or specific VMs

● Dedicated API for virtualization○ CRD now, working on User API Server for custom (sub-)resource types○ Allows to define a VM resources and actions

Current API (Example)

kind: VMapiVersion: kubevirt.io/v1alpha1metadata: name: testvmspec: nodeSelector: kubernetes.io/hostname: master domain: name: testvm type: qemu memory: unit: MB value: 64 vcpus: value: 4 devices: disks: - type: PersistentVolumeClaim - source: name: disk-01

Libvirt portion

Kubernetes scheduling

Kubernetes pod

Kubernetes volumes

Runtime

● Virtlet, runv, ClearContainers use a CRI/OCI runtime on the node level

● Containerized runtime○ Libvirt as the underlying runtime - Proven, stable, and feature rich○ Libvirt and qemu in a container○ VMs are moved into the resource group of pods for proper accounting○ Independent runtime life-cycle, no node OS dependencies (except kmods)○ VMs becomes a (nearly) container workload

Integration

● Virtlet, runv, ClearContainers integrate (mainly) on the node level

● Kubernetes Add-On, Now CRD → Custom API Server is a WIP○ VM specific user API Server for API server aggregation○ CRD/UAS recommended way to extend Kubernetes API○ Permits to reuse stock Kubernetes resources on the API level

■ I.e. volumes backed by qemu supported protocol can be directly consumed by qemu, not required to go through kubelet

Architecture

Example: Launching a VM

$ kubectl create -f pod.yaml$ kubectl create -f vm.yaml

Awaiting …

Node

DaemonSetDeployment

virt-controller

Cluster

virt-handler

libvirtd

VM CRD/UAS

Create VM

DaemonSetDeployment

virt-controller

NodeCluster

libvirtd

virt-handler

VM CRD/UAS

Watch VM

DaemonSetDeployment

virt-controller

1

NodeCluster

1: Controller watches VM life-cycle

libvirtd

virt-handler

VM CRD/UAS

Schedule pod

Pod (VM Pod) DaemonSetDeployment

virt-controller

1

2

NodeCluster

1: Controller watches VM life-cycle2: Controller creates a pod for the VM, kube-scheduler schedules the pod

libvirtd

virt-handler

VM CRD/UAS

Launch domain

Pod (VM Pod)

VM qemu

DaemonSetDeployment

virt-controller

1

3

NodeCluster

Only in the cgroup of the VM pod

1: Controller watches VM life-cycle2: Controller creates a pod for the VM, kube-scheduler schedules the pod3: Handler monitors and react to VM object state changes

libvirtd

virt-handler

Pod (VM Pod)

VM qemu

Pod (VM Pod)

VM qemu

VM TPR/UASVM TPR/UAS

VM CRD/UAS

Repeat

DaemonSetDeployment

virt-controller

1

3

1: Controller watches VM life-cycle2: Controller creates a pod for the VM, kube-scheduler schedules the pod3: Handler monitors and react to VM object state changes

NodeCluster

Pod (VM Pod)

VM qemu

libvirtd

virt-handler

Technical gaps

● Depending on the area, a lot of functionality is already there.

● Gaps need to be solved upstream (if possible)

○ Technically

○ Conceptually

Detailed Gaps (examples)

● Clustering○ Resource protections (exclusiveness,

fencing)● Host

○ Host Life-Cycle Management○ Device management

● Compute○ Process aware CPU/NUMA pinning○ Dynamic SLA (hotplug CPU/RAM)

● Network○ Layer 2 network vs. pod IP

● Storage○ Multipath○ Advanced operations (cloning, snapshots)

● Scheduling○ Resource driven○ Custom metrics○ Rescheduling○ Modularity of scheduling units

● Infrastructure○ Kubectl plugins○ Add-on formalization○ UAS native object storage in Kubernetes

Thoughts so far

● API granularity and focus○ How to: simple-as-apod (few data points), workload-types-for-free-for-vms

● Still need to think through kubelet/libvirt relationship for VM process ownership (cgroups, svirt, numa pinning, pci passthrough)

○ Collides with Kube’s process and resource ownership model

● Kube still has gaps to deliver all required functionality● Operator pattern works out nice

Summary: Continue with research

● Different use cases - we are focusing on the full virt/cloud one● While early, potential for convergence is promising● Looks like a win-win - benefits for both kubernetes and virt● But need to balance between virt expectations and kubernetes way of

doing things● POC is easy - enterprise class solution is hard

https://github.com/kubevirt/demo (with minikube)

$ minikube start --vm-driver kvm --network-plugin cni

$ git clone -b openshift-commons-briefing-201708 \

https://github.com/fabiand/kubevirt-demo.git

$ cd kubevirt-demo

$ bash run-demo.sh

Questions?

Thank you.

Join us athttps://github.com/kubevirt@kubevirt

Recommended