32
6 Months Sailing with Docker in Production by @hunglin at @VideoBlocks 1

6 Months Sailing with Docker in Production

Embed Size (px)

Citation preview

Page 1: 6 Months Sailing with Docker in Production

6 Months Sailing with Docker in Productionby @hunglin at @VideoBlocks

1

Page 2: 6 Months Sailing with Docker in Production

2

Page 3: 6 Months Sailing with Docker in Production

3

Page 4: 6 Months Sailing with Docker in Production

About ● A media company - Creative Content Everyone Can Afford● Since 2011, we saved more than $5B for our customers● We reached 1M clips of inventory + $1M marketplace

sales faster than any company in history● 15 engineers (total 60+ employees)● 9M requests per day, peak at 300 requests per second● deploy about 3 times a week

4

Page 5: 6 Months Sailing with Docker in Production

About Me● Data Handyman @ Videoblocks● codingphilosophy.com● Organizer of DC Scala meetup (next one on 3/23)

5

Page 6: 6 Months Sailing with Docker in Production

6

Page 7: 6 Months Sailing with Docker in Production

Once Upon a Time...● Infrastructure: web, redis, mysql servers +

firewall/load balancer are managed by Rackspace.● Source Control: github● Unit Test: CircleCI● Integration Test: Homemade tool + selenium● Configuration Management: Chef (by Rackspace)● Deploy/Rollback: Capistrano (by Rackspace)● Monitoring: New Relic

7

Page 8: 6 Months Sailing with Docker in Production

Our Team Grows

8

Page 9: 6 Months Sailing with Docker in Production

9

Page 10: 6 Months Sailing with Docker in Production

Goals (or dreams)

❏ Local Dev === Production❏ Setup Environments with One Command (Push Button Deploy)❏ Resilient + Autoscaling❏ Central Logging, Monitoring, and Metrics (Composable)❏ Add New Service with One Config File❏ Impenetrable Security❏ Cloud Provider Independent❏ Cost Efficient❏ Minimal Development Time

10

Page 11: 6 Months Sailing with Docker in Production

Plans (in theory)

❏ dockerize everything❏ run docker everywhere (docker-compose)❏ docker cluster handles high availability + autoscaling

+ new service with one config + security❏ handle credentials with secret management service❏ cluster-aware log/metrics collection + common interface❏ it's docker, even microsoft Azure supports it.❏ use the right tools

11

Page 12: 6 Months Sailing with Docker in Production

12

Page 13: 6 Months Sailing with Docker in Production

Dockerize Everything● configurable by Environment Variables● build time (docker hub + docker machine glitch)● image size (use alpine)● docker version (use build machine)● no state in docker image (use flocker)

13

Page 14: 6 Months Sailing with Docker in Production

14

Page 15: 6 Months Sailing with Docker in Production

Run Docker Everywhere● linux only● need to restart/cleanup docker-machine often● benefit of PHP is gone● setting up IDE/debugger is tricky● production only tools (new relic)● handling stateful services (in dev and prod)● docker-compose can be more programmable● glitch of cpu, memory, and disk ● docker has new version every month

15

Page 16: 6 Months Sailing with Docker in Production

Docker Cluster● Apache Mesos● CoreOs fleet● AWS ECS● Kubernetes● docker swarm

16

Page 17: 6 Months Sailing with Docker in Production

The Replacement

17

an EC2 instance

loggly container

fluentd container

webhead container

HOST_PRIVATE_IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4)

docker run -d -p 514:514 -p 514:514/udp \--restart=always \-e TOKEN=secret \-e TAG={{ sb_env }}-{{ sb_site }}-webhead \--name loggly \sendgridlabs/loggly-docker

docker run -d -p 24224:24224 \--restart=always \--log-driver syslog --log-opt syslog-address=udp://localhost:514 \-e FLUENTD_CONF="{{ 'prod.conf' if sb_env == 'prod' else 'fluent.conf' }}" \--name fluentd \videoblocks/fluentd:{{ sb_config.common.fluentd.docker_image_tag }}

docker run -d -p 80:80 -p 443:443 -p 3000:3000 -p 3001:3001 -p 3002:3002 \--log-driver syslog --log-opt syslog-address=udp://localhost:514 \-e SB_ENVIRONMENT='{{ sb_env }}' \-e SB_SITE='{{ sb_site }}' \-e FLUENT_LOGGER_HOST=$HOST_PRIVATE_IP \-e FLUENT_LOGGER_IS_ENABLED='true' \-e SPHINX_IP='{{ sb_site }}-sphinx.internal' \-e MYSQL_HOST='{{ sb_site }}-rds.internal' \-e MYSQL_USERNAME='{{ sb_config[sb_env][sb_site].db.user }}' \-e MYSQL_PASSWORD='{{ sb_config[sb_env][sb_site].db.password }}' \-e CACHE_REDIS_HOST='{{ sb_site }}-redis.internal' \-e JOB_QUEUE_REDIS_HOST='{{ sb_site }}-redis.internal' \--name {{ sb_env }}-{{ sb_site }}-webhead \{{ sb_config[sb_env].webhead.docker_image }}:{{ deploy_tag }}

Page 18: 6 Months Sailing with Docker in Production

18

The Infrastructure Of One Service

Page 19: 6 Months Sailing with Docker in Production

Deployment

19

● no long tail of DNS● maybe more expensive than rolling update, but you have

instant rollback to exact previous code + environment● but no db migration rollback, so we do it first

Page 20: 6 Months Sailing with Docker in Production

Wait A Minute... You are using Docker as AMI

20

Page 21: 6 Months Sailing with Docker in Production

Yes! Docker Image as AMI 2.0● can run on developers' laptops● build a lot faster● a lot smaller and cheaper● composable on single EC2 instance, no minimal

configuration management

21

Page 22: 6 Months Sailing with Docker in Production

22

Compare to Original Goals with Docker Cluster✓ high availability by ELB health check (careful #1)✓ auto scaling by ASG scaling policy (careful #2)❏ add new service with one config✓ security

Page 23: 6 Months Sailing with Docker in Production

Log and Metrics Collection✓ cluster awareness✓ common interface: stdout, http, syslog❏ in app monitoring (new relic)

23

Page 24: 6 Months Sailing with Docker in Production

Cross Cloud Providers● need a docker cluster as the interface for endpoint,

service discovery, autoscaling, scheduling, resource allocation, etc...

● data is the high gravity part

24

Page 25: 6 Months Sailing with Docker in Production

Use The Right Tools

25

when you have a hammer,

everything looks like your thumb.

Page 26: 6 Months Sailing with Docker in Production

26

Page 27: 6 Months Sailing with Docker in Production

The Good - what we achieved✓ Local Dev ~= Production (without stateful service)➢ Setup Environments with One Command (Push Button Deploy)✓ Resilient + Autoscaling➢ Central Logging, Monitoring, and Metrics (Composable)❏ Add New Service with One Config File➢ Impenetrable Security❏ Cloud Provider Independent➢ Cost Efficient➢ Minimal Development Time

27

Page 28: 6 Months Sailing with Docker in Production

The Bad - the limitation we faced● docker pull issues (docker hub, bugs, slow)● run docker on non-linux machine is not stable● docker cluster is not ready (or not easy enough for me)● not all tools can be controlled by env variables● in app monitoring tool cannot be outside container

28

Page 29: 6 Months Sailing with Docker in Production

The Ugly - the tradeoffs we made

29

● cloudformation - infrastructure as code○ it has mutable state○ it's more like db migration

● duplicate docker images due to tools (new relic) or speed (base image)

● apline for image size

Page 30: 6 Months Sailing with Docker in Production

The New Hope

30

● docker cluster is finally ready? - we'll find out● new networking capabilities● flocker to handle state (data volume)● monitoring tool like sysdig● infrastructure management (take snapshot)

Page 31: 6 Months Sailing with Docker in Production

31

questions?