28
The pain and gains running Docker in live @Pipedrive Renno Reinurm 17.01.17

The pain and gains running Docker in live @Pipedrive

Embed Size (px)

Citation preview

The pain and gains running Docker in live @Pipedrive

Renno Reinurm17.01.17

● Pipedrive helps small businesses control the complex selling process

● Founded in 2010● 30,000 paying customers worldwide● 200+ employees● Offices in Tallinn and Tartu

New York, NY

Pipedrive helps small businesses control the complex selling process

Why to use Docker?

● Growth pains with Chef● New language + new tools = entry barrier● You write recipes seldom enough and forget how it’s done● But it runs fine in test!

Early docker platform started with evaluating running docker inside Vagrant box.

Instead we started to use custom built docker-machine.

Lately moved to Docker4Mac

First use case for containers

Provision on demand test environments per branch.

Was implemented only for test coverage-suite execution environment.

Lot of custom hacks to make it work.

Docker infrastructure v1

The first Docker builds using Codeship Docker CI beta

The first usage of Tutum (Docker Cloud) as orchestration service

Yeah we were using Docker, but

CI processes with Codeship was slow, Docker build itself took ~15minutes

Deployment in Docker Tutum cluster took another ~10minutes

Sometimes it was so slow we wondered if it still works

Stability issues - we experienced “data loss” and “service downtime”

The Birth of Docker Infrastructure v2.0

Requirements:

Improve the speed of CI processes

Improve the reliability of Docker Infrastructure

Docker Infrastructure v2.0

Jenkins for automating processes

Docker image builds

Container deployment

Docker Swarm

Container Scheduler

Shipyard

Troubleshooting

Pain 1 You shall not build/test/deploy Docker container over 5 minutes

Based on: xkcd.com

Improved Docker buildsFirst iteration:FROM nodeENV SERVICE_NAME=statisticsENV SERVICE_DESC="Statistics"ENV SERVICE_TAGS=statisticsENV SERVICE_CHECK_HTTP=/healthENV SERVICE_CHECK_INTERVAL=10sENV SERVICE_CHECK_TIMEOUT=5sEXPOSE 8000WORKDIR /srcCOPY . /src/RUN npm installCMD ["node", "."]

Improved:FROM node:6-alpineENV SERVICE_NAME=statistics \ SERVICE_DESC="Statistics" \ SERVICE_TAGS=statistics \ SERVICE_CHECK_HTTP=/health-statistics \ SERVICE_CHECK_INTERVAL=10s \ SERVICE_CHECK_TIMEOUT=5sEXPOSE 8000WORKDIR /srcUSER nodeCMD ["node", "."]COPY libraries/ /src/COPY src/ /src/

https://youtu.be/X_q2l8hotAc?t=365

Deployment process optimizations

NB! https://docs.docker.com/engine/userguide/storagedriver/selectadriver/

Replacement of Devicemapper to AUFS reduced deployment process time 10x.

There are still improvements possible:

● Handle Linux signals● Parallel rolling updates

https://teespring.com/sigkill

Pain 2

Consumers shall connect only to healthy services

Beware the service discovery corruption

● Always enable health checks

● Use unique health checks or validate output

SERVICE_CHECK_HTTP=/health

vs

SERVICE_CHECK_HTTP=/statistics-health

Pain 3 - Every day maintenance of Jenkins jobs

Pain 4

Container shall handle 10 000 connections and constant high load.

https://youtu.be/PivpCKEiQOQ

We deployed Killer-Container to the cluster and rescheduled it every time then it managed to crash the Docker host

Issues

● Linux kernel 3.13● Fluentd logging agent● Graylog logging driver● Kernel sysctl parameters● Swap usage● PEBKAC

○ "net.ipv4.ip_forward" => 0

● WARNING: No memory limit support● WARNING: No swap limit support● WARNING: No kernel memory limit support● WARNING: No oom kill disable support● WARNING: No cpu cfs quota support● WARNING: No cpu cfs period support

Service risk mitigation

● Number of nodes in cluster● Spreading policies● Multiple instances● Memory limitations● Healing policies

○ Autorestart○ Reschedule

Gains

Evolution of applications

generic enough to run in multiple regions, environments

Delivery time from idea to live

From 2 weeks to 1 day

Servers vs Services

those be managed asynchronously

Statistics~ 70 inhouse built Dockerized services

~ 90 Docker images

~ 500 containers running

3200 container deploys since October

Remember - Every Day

1 new container borns to stay @Pipedrive

30 container deployments

Recommendations for goingLive with Docker● You still need to take care of OS ● Read Github issues● Read from the source● Keep it up to date● (Performance) Test it

Thank you!

Give me your feedback

@rreinurm