Container & kubernetes

Preview:

Citation preview

Container & Kubernetes

Written by Ted Jung (jongnag@gmail.com)(Cloud Native Engineer)

I. Base Techs(container)FSCGroupsNamespacesCOW

II. Kubernetes (service networking)

What is Container?Lightweight VM. But, It’s not quite like a VM

1 Uses the host kernel2 Does not need to boot a different OS3 Does not have its own modules4 Does not need init as PID 1

It’s just normal processes on a host machine

What is Container?Containers wrap a pieces of software in a complete filesystem that contains everything it needs to run:• Code,• Runtime,• System tools• System librariesAnything you can install on a server

This guarantees that it will always run the same regardless of the environment where it is running on.

VM vs. Container

Infrastructure

Operating system

Hypervisor

Guest OS

Guest OS

Guest OS

Bins/Libs

App1

Bins/Libs

App2

Bins/Libs

App3

Infrastructure

Operating system

Docker Engine

Bins/Libs

App1

Bins/Libs

App2

Bins/Libs

App3

Share the kernel with other containersRunning as isolated processes in user spaceDocker containers are not tied to any specific infrastructure

What is Docker?

lmctfyopenvzzonelibcontainerlxcrkt

Why Docker?

• Easy to use : Simple and accessible tooling

• High degree of reuse and extensibility

: stackable file system

Before go ahead further..

FSCgroupsNamespaces

Base tech of container(AUFS)

Group of branches by order- a branch (=a single directory)- is stored in a directory in the hostat least,- a single branch for Read-only many Read-Write branches Read-only

Read-write

Read-writeRead-write

Base tech of container(AUFS)

Mount pointAUFS, mount-point of a container is:/var/lib/docker/aufs/mnt/$CONTAINER_ID/

It is only mounted when the container is running

AUFS branches(read-only & read-write) are in:/var/lib/docker/aufs/diff/$CONTAINER_OR_IMAGE_ID

Base tech of container(AUFS)

e.g. Create Container

/proc/mount/sys/fs/aufs/si_XXXX/br*

/var/lib/docker/aufs/diff/XXXContainer = a group of branches

host container

Base tech of container(AUFS)A file (container / host)

Delete container

container

Host

Base tech of container(AUFS)

Docker V1.10

: Content addressable storage model

Ubuntu: 15.04 Image

C84bfc126a2 188MB

D14bfc54ea1 194.5KB

c80179960767 1.895KB

6d45a3841788 0 B

Thin R/W layer Container layer

Image layer (R/O)

- Docker storage driver is:enabling and managing both image layer & container layer.stacking layers , providing a single unified view

- Location: /var/lib/docker/.

Ubuntu: 15.04 Image

C84bfc126a2 188MB

D14bfc54ea1 194.5KB

c80179960767 1.895KB

6d45a3841788 0 B

Thin R/W layer

• Security• Avoid ID Collisions• Guarantees data integrity

Random UUID

CryptographicContent hashes

Storage DriverAUFS BtrfsDevice mapperOverlayFSZFS

1. Search through the image layers top-down approach

2. Perform “copy-up” operation copies the file thin writable layer

3. Modify the copy of the file

File modification(create, delete, update) steps..

Ubuntu: 15.04 Image

C84bfc126a2 188MB

D14bfc54ea1 194.5KB

c80179960767 1.895KB

6d45a3841788 0 B

Thin R/W layer

Ubuntu: 15.04 Image

C84bfc126a2 188MB

D14bfc54ea1 194.5KB

c80179960767 1.895KB

6d45a3841788 0 B

Thin R/W layer

6d45a3841788 2B

Modification2B on 6d~

copy-up

modification

Developed by Rohit Seth in 2006 under the name “Process Containers”Kernel capability to limit, account(metering) and isolate resourcesCPU, Memory, Disk I/O, Network

Base tech of container(CGroups)

Cgroup controllers Memory controller CPUset controller CPUaccounting controller CPUscheduler controller Devices controller I/O controller for block devices Freezer Network Class Controller

reducing resource contention and increasing predictability in performance

Controller Description

memoryAllows for setting limits of RAM and resource usage and querying cumulative usage of all processes in the group

cpuset Binding of processes within a group to a set of CPUs and controlling migration between CPUs

cpuacct Information about CPU usage for a group of processes

cpu Controlling the prioritization of processes in the group

devices Access control lists on character and block devices

Base tech of container(CGroups)

Base tech of container(CGroups)

Cgroups(control groups)A ‘cgroups’ associate a set of tasks with a set of parameters for one or more subsystemsA ‘subsystem’ is a module that makes use of the task grouping facilities provided by cgroups to treat groups of tasks in particular waysA ‘subsystem’ is typically a “resource controller” that schedules a resource and applies per-cgroup limitsA ‘hierarchy’ is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy and a set of subsystems; each subsystem has system-specific state attached to each cgroups in the hierarchy. Each hierarchy has an instance of the cgroups virtual filesystem associated with it.

Cgroup subsystem-Isolation and special controls: cpuset, namespace, freezer, device, checkpoint/restart-Resource control: cpu(scheduler), memory, disk io, network

Base tech of container(Namespace)

handle six items in table belowController Description

PID Processes (Process ID)NET Network Interface/ Iptables/ Routing Tables/ SocketsMNT Root File SystemUTS HostnameIPC Inter Process Communication

USER UID/GID, security improvement

Base tech of container(Namespace)

Namespaces are created with system call “clone()”Namespaces are materialized by pseudo-files in /proc/<pid>/ns

Base tech of container(Summarize)

Why do we need CGroups?SLA Management: reduce resource contention and increase predictability in performanceLarge Virtual Consolidation: prevent single or group of virtual machines monopolizing resources or impacting other env

Cgroups-Limit use of resources

Namespace-Limits what resources can be seenNamespace provide processes with their own view of system Docker

Linux Kernel

namespaces cgroups

libcontainer

Base tech of container(COW)Everyone has a single shared copy of the same data until it’s over written, and then a copy is made.

Docker uses COW, which essentially means that every instance of your docker image uses the same files until one of them needs to change a file.

K8S terms

ReplicationControllers

Dynamically manage(create, kill, etc) the lifecycle of pods(Scaling up/down, rolling updates)

Clusters

Services• abstraction• a REST object• a logical set of

pods & a policy

Servicespod pod pod

pod pod pod

Pods• a collocated

group of Docker containers with shared volumes

• each of pods are born and die

container container

server server server

Deployable unit• Created• Scheduled• Managed

Pool ofKubernetesresources

IPtables Rule

containercontainer

endpoints

K8S terms{ “kind”: ”Service”, “apiVersion”:”v1”, “metadata”:{ “name”: ”my-service” }, “spec”:{ “selector”: { “app”: ”MyApp” }, “ports”:[{ “protocol”: ”TCP”, “port”:”80”, “targetPort”:9376” }] } }

service

pod pod

endpoint

Selector = “app: MyApp”

Cluster IP my-service

targetPort:9376

Serviceproxy

K8S terms (routing mode of service traffic)

Iptables rule

service

endpoint

endpoint

endpoint

Kube-proxy

Master

mode: userspace

pod

redirect

Iptables rule

service

endpoint

endpoint

endpoint

Kube-proxy

Master

mode: iptables

pod

redirect

• Fast• ReliableBut,• No retry

How K8S worksKubernetes Master

Worker Node

API server

ETCD

Scheduler

Kubernetes controller manager server

kublet Kube-proxyMaster’s status is stored

Validates and configuresPodServiceReplication controller

REST operations

Container manifest: YAML

(description of pod)Services

pod pod pod

8080

4001

8080

8080

Schedule pods to worker nodesSynchronize pod status

K8S Service Traffic Flows

rc:3 rc:1 rc:2

Service 2

(…)

Service 3

(back-end)

kube-proxy kube-proxy

Service 1

(front-end)

kube-proxy

request

Cluster-domain : 10.100.0.10 (Service_Cluster_IP_Range, virtual IP)Cluster-pool: 192.168.0.0/16

ClusterDomain

ClusterPool

skyd

ns

skyd

ns

podcontain

er

pod podcontain

ercontain

er

pod pod podcontain

ercontain

ercontain

er

K8S Service Traffic Flows (e.g.)

Then, what is Kube-proxy?

Node #2Node #1

Kube-proxy

podcontainer

podcontainer

Iptables rule

Watches kubernetes masterto add and remove the objects- Service- Endpoints

Can do simple TCP,UDP stream forwardingRound Robin TCP, UDP forwardingVIP is managed by kube-proxyWatch all servicesUpdates iptables after backend changingTranslate ServiceIP to Pod IP

Master ETCD Cluster

API Server ETCDCluster statusCurrent configuration

SkyDNSSkyDNS in Kubernetes?Kubernetes offers a DNS cluster addon, which most of the supported environments enabled by default.SkyDNS is a DNS service, with some custom logic to slave it to the Kubernetes API Server

Create Service DNS name is mapped to the service

Virtual IP address is assigned to a service

Kubelet –v=5 –address=0.0.0.0 –port=10250 –hostname_override=105.144.47.24 –api_servers=105.*.*.23:8080 –healthz_bind_address=0.0.0.0 –healthz_port=10248 –network_plugin=calico –cluster-domain=cluster.local –cluster-dns=10.100.0.10 –logtostderr=true

SkyDNS(cont..)

ETCD in pod(DNS record)

SkyDNS in pod(DNS server)

Kube2SKY in pod

(bridging between Kubernetes and

ETCD)

Kubernetes(kubelet)

Pods in running

Kubernetes(Master)

Service info is published/written into etcdThen,SkyDNS be able to retrieve the name of service

Kublet pretends itself to a DNS server

Info of Service is pulledfrom master into SkyDNSe.g. what services has changed?

RetrieveSearch

QueryUpdate

Thank You

Recommended