21
See Docker from the Perspective of Linux Process Allen Sun@DaoCloud Hangzhou Docker Meetup 2015.03.14

Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

  • Upload
    ledang

  • View
    224

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

See Docker from the Perspective of Linux Process

Allen Sun@DaoCloud Hangzhou Docker Meetup

2015.03.14

Page 2: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Agenda

1. Prerequisite

Linux Process (do_fork / copy_process )

Namespaces

2. How Docker deals process

dockerinit, ENTRYPOINT, CMD

Page 3: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

syscall——fork() Process A

fork()

Process A continues

Process B

execev()

exit()

wait() ZOMBIE

SIGCHLD

clean up

Child - new PID

executes a different program !

Reference: http://www.lynx.com/the-fork-call-posix-processes-and-parent-child-relationships

Parent - original PID

Page 4: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

do_fork do_fork

copy_process

determine PID

wake_up_new_task

wait_for_completion

copy_process

check flags

dup and init task_struct

check resource limit

copy/share process details

Reference:Mauerer W. Professional Linux kernel architecture[M] Figure 2-7 and Figure 2-8. John Wiley & Sons, 2010.

copy_semundo

copy_namespaces

……

set IDs, task relationships, etc.

……

Page 5: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

struct nsproxy *nsproxy

struct task_struct

struct uts_namespace *uts_ns

struct nsproxy

struct mnt_namespace *mnt_ns

struct net *net_ns

struct uts_namespace

struct mnt_namespace

struct net

task_struct and namespaces

Nsproxy proxies 5 kinds of namespace for a process.

1.uts_namespace 2.mnt_namespace 3.pid_namespace 4.ipc_namespace 5.net

user_namespace is not in nsproxy! Based on Linux kernel 3.13

Page 6: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

What is in namespaces? struct pid_namespace { … struct task_struct * child_reaper; … int level; struct pid_namespace *parent; };

struct mnt_namespace { atomic_t count; struct mount *root; struct list_head list; …… };

Based on Linux kernel 3.13

struct uts_namespace { struct kref kref; struct new_utsname name; struct user_namespace *user_ns; …… }

struct new_utsname { char sysname[..]; char nodename[..]; char release[..]; char version[..]; char machine[..]; char domainname[..]; }; ……

Page 7: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Docker? Where is Docker?

Docker Client

Docker Daemon

Docker Container

Docker Container

……

fork !

do_fork

copy_process

copy_namespaces

do_execve

Docker Container is born just by syscall fork and exec a process !

Page 8: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Difference (Docker’s fork vs normal fork)

Special flags used in syscall do_fork()

flag name Linux kernel version

CLONE_NEWNS 2.4.19

CLONE_NEWUTS 2.6.19

CLONE_NEWIPC 2.6.24

CLONE_NEWPID 2.6.24

CLONE_NEWNET 2.6.29

CLONE_NEWUSER 3.8

Page 9: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Namespaces in Docker func init() { namespaceList = Namespaces { {Key: "NEWNS", Value: syscall.CLONE_NEWNS, File: "mnt"}, {Key: "NEWUTS", Value: syscall.CLONE_NEWUTS, File: "uts"}, {Key: "NEWIPC", Value: syscall.CLONE_NEWIPC, File: "ipc"}, {Key: "NEWUSER", Value: syscall.CLONE_NEWUSER, File: "user"}, {Key: "NEWPID", Value: syscall.CLONE_NEWPID, File: "pid"}, {Key: "NEWNET", Value: syscall.CLONE_NEWNET, File: "net"}, } }

Based on libcontainer v1.2.0

USER_NAMESPACE: not fully implemented in Docker NET_NAMESPACE: not used in network mode “host” and ”other container”

Page 10: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

What to Fork?

Docker Client

Docker Daemon

? ?

fork with flags!

…… Docker Container

fork Docker Container?

Docker Container == Process(es) ?

Page 11: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

What Process to Fork?

Whatever! A process indeed.

Process is just forked, not execed yet.

Result is like below:

task_struct ready

namespaces ready

other resources ready

Process is still static, no program is running.

Page 12: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Then exec! exec what? Have you ever heard of

dockerinit, ENTRYPOINT or CMD in Docker?

name description

dockerinit init thing that first runs inside a new namespace to setup mount, net namespaces and other things.

ENTRYPOINT An ENTRYPOINT allows you to configure a container that will run as an executable

CMD The main purpose of a CMD is to provide defaults for an executing container.

Reference: https://docs.docker.com/reference/builder

Page 13: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Dockerinit, ENTRYPOINT, CMD

Docker Daemon

process

fork

exec

dockerinit ENTRYPOINT CMD

1. 2. 3.

new namespaces

init namespaces

the only process (same PID)

Page 14: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

dockerinit

Docker Daemon and dockerinit

Docker Daemon

syncPipe

parent

child

Usage: coordnate the sequential of Docker Daemon and dockerinit.

Dockerinit will be blocked if nothing read in syncPipe.

Why ?

Page 15: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

How to coordinate? Docker Daemon

dockerinit

1.Create Command The executable in container(dockerint)

2.Create syncPipe

3.Pass pipe to Child

4. command.start() Fork and exec the command

syncPipe(nothing) blocked

5. SetupCgroups syncPipe(nothing) blocked, controlled by cgroup

6. init network syncPipe(nothing) blocked, controlled by cgroup

7.Sync with Child syncPipe(has networkState) read from syncPipe

fork, new PID!

Based on libcontainer v1.2.0

Page 16: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

How to coordinate? Docker Daemon dockerinit

1.SetupNetwork

2.SetupRoute

3.Init Mount ns

4.Apply apparmor

5.execv Entrypoint

Setup devices, mount points and fs

ENTRYPOINT exec, same PID!

exec, same PID! CMD

Finally, YOUR APP! 8.command.wait()

Based on libcontainer v1.2.0

x. execv Cmd

Page 17: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Docker Container

Docker Daemon

process

fork

exec

dockerinit ENTRYPOINT CMD (your application)

1. 2. 3.

new namespaces

init namespaces

the only process (same PID)

cgroups applied

Docker Container process process process

process

Page 18: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Why to Coordinate?

1. Docker Daemon needs to Synchronize with dockerinit.

block dockerinit so no children of dockerinit can escape from cgroups.

2. Can not switch namespace in Go runtime. blocked until Docker Daemon transfers network details that will be used

to setup network interface in newnet namespace.

Page 19: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]
Page 20: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

Q&A

Page 21: Hangzhou Docker Meetup 2015.03files.meetup.com/16521172/SunHongliang_See_Docker_from_the... · Hangzhou Docker Meetup 2015.03.14 . ... Mauerer W. Professional Linux kernel architecture[M]

PRESENTATION TITLE

SPEAKER NAME

2014 / 12 /09

THANK YOU !

Email: [email protected] weibo: @莲子弗如清 webchat: shlallen