34
@snehainguva

Containers: The What, Why, and How

Embed Size (px)

Citation preview

@snehainguva

digitalocean.com

containersthe what, why, and how

digitalocean.com

about mesoftware engineer @DigitalOceandelivery teamkubernetes, prometheus, terraform

digitalocean.com

digitalocean.com

the plan:● Build your own container

● Containers vs. VMs

● Container ecosystem

digitalocean.com

what is a container?

digitalocean.com

what is a container?

“a lightweight OS-level virtualization method”“stand-alone piece of executable software”

“NOT a virtual machine”

digitalocean.com

build your own container

1. run input commands with arguments

2. add hostname limitations

3. add process ID limitations

4. add mount point/filesystem limitations

digitalocean.com

let’s start with a basic “container”

func main() {switch os.Args[1] {case "run":

run()default:

panic("what?")}

}

func run() {fmt.Printf("running %v\n", os.Args[2:])

cmd := exec.Command(os.Args[2], os.Args[3:]...)

cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdout

must(cmd.Run())}

func must(err error) {if err != nil {

panic(err)}

}

digitalocean.com

let’s start with a basic “container”

digitalocean.com

let’s start with a basic “container”

digitalocean.com

how can we restrict hostname access?

digitalocean.com

namespaces!!!

digitalocean.com

func run() {fmt.Printf("running %v\n", os.Args[2:])

cmd := exec.Command(os.Args[2], os.Args[3:]...)

cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdout

cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWUTS,

}

must(cmd.Run())}

UTS namespace

digitalocean.com

what about PID access?

digitalocean.com

UTS + PID namespace: attempt 1

func run() {fmt.Printf("running %v\n", os.Args[2:])cmd := exec.Command(os.Args[2],

os.Args[3:]...)cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdoutcmd.SysProcAttr = &syscall.SysProcAttr{

Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,

}must(cmd.Run())

}

UTS + PID namespace: attempt 2func run() {

cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...) cmd.Stdin = os.Stdin

cmd.Stderr = os.Stderrcmd.Stdout = os.Stdoutcmd.SysProcAttr = &syscall.SysProcAttr{

Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,}must(cmd.Run())

}

func child() {fmt.Printf("running %v as pid %v\n", os.Args[2:], os.Getpid())cmd := exec.Command(os.Args[2], os.Args[3:]...)cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdoutmust(cmd.Run())

}

UTS + PID namespace: attempt 2

UTS + PID + MNT namespace: attempt 1

func run() {

md := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...) // link to currently running process

cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdout

cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID |

syscall.CLONE_NEWNS,}must(cmd.Run())

}

UTS + PID + MNT namespace: attempt 1

Initial mounts in MNT namespace inherited from creating namespace → filesystem same as host

next step: UTS + PID + MNT namespace + new root filesystem

example

func child() {fmt.Printf("running %v as pid%v\n", os.Args[2:], os.Getpid())

cmd := exec.Command(os.Args[2], os.Args[3:]...)cmd.Stdin = os.Stdincmd.Stderr = os.Stderrcmd.Stdout = os.Stdout

must(syscall.Chroot("/home/rootfs"))must(os.Chdir("/"))must(syscall.Mount("proc", "proc", "proc", 0, ""))must(cmd.Run())

}

TODO

digitalocean.com

what is a container?

process with isolation, shared resources, and

layered filesystems

what is a container?

namespace: linux kernel feature that isolates and virtualizes system resources for a collection of processes and their children

● PID: gives process own view of subset of system processes. ✔

● MNT: gives process mount table and allows process to have own filesystem ✔

● NET: gives process own network stack. (Container can have virtual ethernet pairs to link to host or other containers.)

● UTS: gives process own view of system hostname and domain name ✔

● IPC: isolates inter-process communications (i.e. message queues)

● USER: newest namespace that maps process UIDs to different set of UIDs on host (can map containers root uid to unprivileged UID on host)

what is a container?

cgroups: control groups collect set of process tasks IDS together and apply limits, such as for resource utilization

● Enforce fair/unfair resource sharing between processes● Exposed by kernel as special file system to to mount● Add a process or thread by adding process IDs to task file and

read/configure values by editing subdirectory files

what is a container?

layered filesystems: optimal way to make a copy of root filesystem for each container

● one of the reasons why it is easy to move containers around● can “copy on write” (btrFS) ● can use “union mounts” (aufs, OverlayFS) - way of combining multiple

directories

digitalocean.com

Containers

vs.

VMs

digitalocean.com

containers vs. VMS

Source: http://electronicdesign.com/dev-tools/what-s-difference-between-containers-and-virtual-machines

digitalocean.com

vms containers● Hypervisors run software on physical

servers to emulate a particular hardware system (aka a virtual machine)

● VM runs a fully copy of the operating system (OS)

● Hardware is also virtualized● Can run multiple applications

● Run isolated process on a single server or host operating system (OS)

● Can migrate only to servers with compatiable OS kernels

● Best for a single application

digitalocean.com

container ecosystem● Container runtime● Orchestration tools● As-a-service

digitalocean.com

Source: https://docs.docker.com/engine/understanding-docker/ https://coreos.com/rkt/docs/latest/rkt-vs-other-projects.html#rkt-vs-docker

containers

digitalocean.com

container orchestration

Source: https://github.com/nkhare/container-orchestration/blob/master/kubernetes/README.md

digitalocean.com

___ as-a-servicecontainer service, managed clusters, etc.

Source: https://coreos.com/tectonic/