View
53
Download
2
Category
Preview:
Citation preview
1/22
Building blocks of Linux Containers
Motiejus Jakstysmotiejus@uber.com
@mo kelione
2016-11-18
c© 2016. Uber Technologies Inc. All rights reserved.
2/22
Table of Contents
IntroductionWhy meA container in Linux is...
NamespacesIsolation in LinuxWhat did we just do
File systems and COW
What did we forget?Leftover elephants in the room
The End
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details.
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.
I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details!
→ You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
3/22
Conclusion!
Devil Hides in The Details?
I Many use Docker.I We lack time to understand.
I You need to understand infra to successfullytroubleshoot infra.
I There are trade-offs in the configuration.
I Make container engine in 30 minutes.
I Details! → You will still pick existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
4/22
Why me
My resume: oncall experience.
I 2009− 2012 Telecom (Dev + Ops).
I 2012− 2014 Online Gaming (Dev + Ops).
I 2014− 2016 Amazon (Dev + Ops).I 2016− now Uber (Dev + Ops):
I From 2016.02: Dev.I From 2016.11: SRE.
I had to understand how exactly infrastructureworks.
c© 2016. Uber Technologies Inc. All rights reserved.
4/22
Why me
My resume: oncall experience.
I 2009− 2012 Telecom (Dev + Ops).
I 2012− 2014 Online Gaming (Dev + Ops).
I 2014− 2016 Amazon (Dev + Ops).I 2016− now Uber (Dev + Ops):
I From 2016.02: Dev.I From 2016.11: SRE.
I had to understand how exactly infrastructureworks.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
5/22
A container in Linux is ...
Fork/exec with bells & whistles:
I Fancy tarball for distribution.
I COW filesystem to make it start fast.
I Cgroups for fairness.
I Namespaces for isolation.
c© 2016. Uber Technologies Inc. All rights reserved.
6/22
Table of Contents
IntroductionWhy meA container in Linux is...
NamespacesIsolation in LinuxWhat did we just do
File systems and COW
What did we forget?Leftover elephants in the room
The End
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
7/22
We will cover
I User namespaces.
I Pid namespaces.
I Mount namespaces.
I Network namespaces.
I There are more, but not today.
c© 2016. Uber Technologies Inc. All rights reserved.
8/22
User namespace
Become container-local root.unshare --map-root-user
c© 2016. Uber Technologies Inc. All rights reserved.
9/22
Mount namespace
Hide container mounts.unshare --mount
c© 2016. Uber Technologies Inc. All rights reserved.
10/22
Pid namespace
Hide other pids.unshare --pid --mount-proc --fork
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).
I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.
I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.
Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
11/22
Network namespace
Demonstrate this:
I Create namespace.
I Activate loopback (lo).I Create pair of devices veth1a and veth1b:
I veth1b will go to the namespace.I veth1a will stay in default.
I Add ip addresses.
I curl and ping.
I lsof, bind on ports separately.Ever wanted to run tcpdump on an application?
c© 2016. Uber Technologies Inc. All rights reserved.
12/22
Network namespace
default
lo
127.0.0.1
eth0
192.0.2.1
t1
lolo
127.0.0.1
veth1a veth1bveth1a
10.0.0.1
veth1b
10.0.0.2
c© 2016. Uber Technologies Inc. All rights reserved.
12/22
Network namespace
default
lo
127.0.0.1
eth0
192.0.2.1
t1
lo
lo
127.0.0.1
veth1a veth1bveth1a
10.0.0.1
veth1b
10.0.0.2
c© 2016. Uber Technologies Inc. All rights reserved.
12/22
Network namespace
default
lo
127.0.0.1
eth0
192.0.2.1
t1
lolo
127.0.0.1
veth1a veth1bveth1a
10.0.0.1
veth1b
10.0.0.2
c© 2016. Uber Technologies Inc. All rights reserved.
12/22
Network namespace
default
lo
127.0.0.1
eth0
192.0.2.1
t1
lolo
127.0.0.1
veth1a veth1b
veth1a
10.0.0.1
veth1b
10.0.0.2
c© 2016. Uber Technologies Inc. All rights reserved.
12/22
Network namespace
default
lo
127.0.0.1
eth0
192.0.2.1
t1
lolo
127.0.0.1
veth1a veth1bveth1a
10.0.0.1
veth1b
10.0.0.2
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
13/22
What did we just do
Created a container:
User namespace apt-get, iptables, mount, etc.
Isolated pids no nobody, isolate from each other.
Isolated mounts e.g. for /tmp.
Isolated network safely bind to :80.
An improvement over ”run and hope it doesn’taffect anything else”.
c© 2016. Uber Technologies Inc. All rights reserved.
14/22
Table of Contents
IntroductionWhy meA container in Linux is...
NamespacesIsolation in LinuxWhat did we just do
File systems and COW
What did we forget?Leftover elephants in the room
The End
c© 2016. Uber Technologies Inc. All rights reserved.
15/22
File systems and COW
A container:
I Needs a file system.
I Starts quickly regardless of size.
Do not want to copy 1GB with every startup.Copy On Write!lvm? zfs? btrfs?
c© 2016. Uber Technologies Inc. All rights reserved.
15/22
File systems and COW
A container:
I Needs a file system.
I Starts quickly regardless of size.
Do not want to copy 1GB with every startup.Copy On Write!lvm? zfs? btrfs?
c© 2016. Uber Technologies Inc. All rights reserved.
15/22
File systems and COW
A container:
I Needs a file system.
I Starts quickly regardless of size.
Do not want to copy 1GB with every startup.
Copy On Write!lvm? zfs? btrfs?
c© 2016. Uber Technologies Inc. All rights reserved.
15/22
File systems and COW
A container:
I Needs a file system.
I Starts quickly regardless of size.
Do not want to copy 1GB with every startup.Copy On Write!
lvm? zfs? btrfs?
c© 2016. Uber Technologies Inc. All rights reserved.
15/22
File systems and COW
A container:
I Needs a file system.
I Starts quickly regardless of size.
Do not want to copy 1GB with every startup.Copy On Write!lvm? zfs? btrfs?
c© 2016. Uber Technologies Inc. All rights reserved.
16/22
A quick demo
I Create tank/images/debian@latest
I Create tank/containers/t1 from @latest
I unshare --mount --pid --fork chroot . bash
c© 2016. Uber Technologies Inc. All rights reserved.
17/22
Table of Contents
IntroductionWhy meA container in Linux is...
NamespacesIsolation in LinuxWhat did we just do
File systems and COW
What did we forget?Leftover elephants in the room
The End
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
18/22
Leftover elephants in the room
I Trivial to escape this ”container”.
I Sec: no leftover file descriptors.
I Resource fairness.
I Sec/DoS: shared kernel resources.
I Supervision, daemonization and cleanup.
I Logging.
I Collect zombie processes.
I Image management.
Should someone else do it?
c© 2016. Uber Technologies Inc. All rights reserved.
19/22
We almost have a container engine
I But look at my conclusions again.
I Devil hides in the details.
I Tooling companies (Docker, CoreOS, etc)raised > $108.
c© 2016. Uber Technologies Inc. All rights reserved.
19/22
We almost have a container engine
I But look at my conclusions again.
I Devil hides in the details.
I Tooling companies (Docker, CoreOS, etc)raised > $108.
c© 2016. Uber Technologies Inc. All rights reserved.
19/22
We almost have a container engine
I But look at my conclusions again.
I Devil hides in the details.
I Tooling companies (Docker, CoreOS, etc)raised > $108.
c© 2016. Uber Technologies Inc. All rights reserved.
19/22
We almost have a container engine
I But look at my conclusions again.
I Devil hides in the details.
I Tooling companies (Docker, CoreOS, etc)raised > $108.
c© 2016. Uber Technologies Inc. All rights reserved.
20/22
To recap
I Easy to understand kernel facilities.
I Devil hides in the details.
I Either spend a lot of time and headache, orre-use existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
20/22
To recap
I Easy to understand kernel facilities.
I Devil hides in the details.
I Either spend a lot of time and headache, orre-use existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
20/22
To recap
I Easy to understand kernel facilities.
I Devil hides in the details.
I Either spend a lot of time and headache, orre-use existing tools.
c© 2016. Uber Technologies Inc. All rights reserved.
21/22
Table of Contents
IntroductionWhy meA container in Linux is...
NamespacesIsolation in LinuxWhat did we just do
File systems and COW
What did we forget?Leftover elephants in the room
The End
c© 2016. Uber Technologies Inc. All rights reserved.
22/22
We’re hiring!
Uber SRE locations: SF, NYC, Seattle, Vilnius.
I Check out join.uber.com
I Also, contact me at motiejus@uber.com
c© 2016. Uber Technologies Inc. All rights reserved.
Recommended