25
Containerizing the largest Dutch e-commerce site: The bol.com story 1 the shop for everyone

bol.com Dutch Container Day presentation

Embed Size (px)

Citation preview

Page 1: bol.com Dutch Container Day presentation

Containerizing the largest Dutch e-commerce site:The bol.com story

1the shop for everyone

Page 2: bol.com Dutch Container Day presentation

•About me

•About bol.com

•Containers... in production

•Mayfly: the original container use case

•Choices, choices...

•Lessons learned

•Next steps

2

Content

Page 3: bol.com Dutch Container Day presentation

About me• Maarten Dirkse (@mdirkse)

• Developer with a history degree, 9+ years of experience (mostly Java)

• Work on the bol.com tools team. We provide the platform for the organisation to build software: Jenkins, SCM, Mayfly (more on that later)

• Have been running containers in production* for almost 2 years. (bol.com has been running containers in production, no *, for a little over a year but really only for the past 5 months)

3

* production internally, for devs, not for customers

Page 4: bol.com Dutch Container Day presentation

•Over 6,5 million active customers

•Virtual footprint of almost 1 visitors million per day

•Over 14,5 million products

•Moved to our own DC two years ago

•VM-based architecture: 1 node per app instance

•Everything is puppetized but was derived from a static config source (Racktables)

•We’re hiring! http://banen.bol.com4

About bol.com

> 95% > 75%

Brand awareness

Page 5: bol.com Dutch Container Day presentation

Containers... in production

5the shop for everyone

^^ obligatory container ship pic

Page 6: bol.com Dutch Container Day presentation

Containers... in production• Several mission-critical apps running in containers... in VM’s

• Mesos + Marathon cluster that runs backend GUI for the webshop

• Home-grown spidering solution that runs on Google Container Engine(also Mesos on GCE)

• Mesos + Marathon cluster that runs Mayfly...

6

Page 7: bol.com Dutch Container Day presentation

Mayfly: the original use case

7the shop for everyone

^^ http://mayflycd.github.io/mayfly-talks/

Page 8: bol.com Dutch Container Day presentation

What is Mayfly?• Team had an idea for allowing teams to develop every service feature in isolation to remove bottleneck of shared test environment

• Needed isolated runtime environment for every feature branch (that’s a lot of environments)

• VM infrastructure was too static, too resource heavy, too slow

8

Page 9: bol.com Dutch Container Day presentation

Containers to the rescue!• Instead of having every feature branch deploy as a VM, deploy it as a container

• Use of containers meant we could spin up environments in seconds and pack more of them onto the hardware

• And so it was that containers were introduced at bol.com. But...

9

Page 10: bol.com Dutch Container Day presentation

DockerCon 2014: docker + ?Towards “peak container confusion”

10

MesosMarathon (or Aurora?)KubernetesSynapse & NervePaastaAWS EC2 CSCoreOS + FleetRancherOSSpotify Helios

wut?

Page 11: bol.com Dutch Container Day presentation

Choices, choices....

11the shop for everyone

^^ obligatory cat pic

Page 12: bol.com Dutch Container Day presentation

The stack• After trying Fleet (buggy) and Kubernetes (5 min old) we settled on Mesos+Marathon running on CoreOS RHEL7 on bare metal

• Consul for service discovery, Kevlar for KV store.

• Choices made for Mayfly became the prototype for the bol.com container infrastructure

12

Page 13: bol.com Dutch Container Day presentation

Dynamic infrastructure is the future!

13 13

As the limitations of our VM-based infrastructure became clear, the platform team became

convinced that the move to dynamic infrastructure was a necessary step to take in

order to keep scaling the IT-architecture.

Page 14: bol.com Dutch Container Day presentation

But wait, we’re not finished!

• After you’re done installing your new, mind-blowing tech you realize a lot of loose ends still need to be tied up.

• Deploying docker to your machines? (and which version)?--> Docker puppet module (https://github.com/garethr/garethr-docker)

• What about logs?--> Logspout (https://github.com/gliderlabs/logspout)

• Zombie processes, SD registration? --> ContainerPilot (https://github.com/joyent/containerpilot)

14

Page 15: bol.com Dutch Container Day presentation

But wait, we’re not finished!• How do you actually tell Marathon what to deploy?--> Marathon terraform provider (https://github.com/Banno/terraform-provider-marathon)

• Install a (properly secured) Docker registry. We went with the stock Docker registry behind a secured Nginx reverse-proxy

• Base images? We choose to use the RHEL7 base image as the root of everything (known quantity in terms of ops support and security vetting)

• And mind how you create images...

15

Page 16: bol.com Dutch Container Day presentation

BOB• Needed a way to audit and vet images that would be run in our landscape

• Created BOB, a wrapper tool for docker build and docker push

• BOB checks your Dockerfile’s and images, ensuring that they meet company standards, before they’re pushed to the registry

• Nothing gets pushed to the registry if it hasn’t been built by BOB

16

Page 17: bol.com Dutch Container Day presentation

BOB (the builder) running on Jenkins

17

Page 18: bol.com Dutch Container Day presentation

Use cases• Mayfly (see above)

• BIZ: lots of small, independently deployable modules with back office functionality. Stateless, ideal for containerization.

• Spidering: horizontally scalable stateless processes that run in the cloud.

18

Page 19: bol.com Dutch Container Day presentation

Lessons learned

19the shop for everyone

^^ nothing funny about this, most of ‘em were learned the hard way

Page 20: bol.com Dutch Container Day presentation

Lessons learned 1/2

• Most of this stuff is relatively new or brand new, expect growing pains

• Don’t run your container orchestration software (Mesos, Marathon) in containers. So if Docker dies, your platform doesn’t degrade with it.

• Running your apps in a container can sometimes lead to interesting issues that don’t exist outside of containers (JVM memory issues, for instance)--> See https://www.youtube.com/watch?v=6ePUiQuaUos for example

20

Page 21: bol.com Dutch Container Day presentation

Lessons learned 2/2

• Graphite-style metrics become problematic in a container world. Prometheus exists, but we can’t just switch from one day to the next

• HA-Proxy & consul template combo is pretty brittle, we now use Fabio-->https://github.com/eBay/fabio

• Keep it simple, make small changesStatic to dynamic is a sea change that is incredibly hard to oversee. Take small steps that deliver value immediately

21

Page 22: bol.com Dutch Container Day presentation

The cultural shift

• Beware the mindset transition that dev teams will have to experience

• Devs: “what do you mean I can’t ssh into the container?”)

• It takes time for ops people to adjust to the idea of dynamic infrastructure. People tend to think from within their own constraints--> OPS control over the app runtime will no longer be absolute

22

Page 23: bol.com Dutch Container Day presentation

Next steps

23the shop for everyone

^^ obligatory lolcat

Page 24: bol.com Dutch Container Day presentation

Next steps

24

• IP-per-container(needed for per-container firewalls, aka to get security off our back)

• Per-app service descriptor that drives app infra and config (to replace hiera data and feed Terraform)

• Migrating ever more apps to the dynamic infrastructure

Page 25: bol.com Dutch Container Day presentation

Thank you!Till next time

the shop for everyone