Upload
docker-inc
View
5.007
Download
0
Embed Size (px)
Citation preview
Agenda
PaaSTAWhat parts does PaaSTA have?How did we glue them together?
Wrap-up
IntroContext: Yelp before PaaSTAWhat's in a PaaS?
Production-ReadyWhat makes a PaaS production-ready?
Lessons learnedNext steps
Service Oriented ArchitectureScale our engineering team by splitting our
codebase into many smaller parts
7
Dependency HellAs services gain adoption, shared libraries
become difficult to upgrade. Not all services are Python anymore.
8
“I wonder how many organizations that say they're "doing DevOps" are actually
building a bespoke PaaS. And how many of those realize it.”
— @markimbriaco
10
A production-ready PaaS should minimize the impact of both application failures
and PaaS failures
17
Use stable components (software, hardware)You will always have failures.
Reduce failure frequency
18
PaaSTA
28
● Delivery: Docker
● Scheduling: Mesos + Marathon
● Discovery: Smartstack
● Alerting: Sensu
Delivery in PaaSTA: Docker
29
● Self-contained artifacts● Provides software flexibility● Reproducible builds● Resource limits make scheduling
easier
● Mesos is an "SDK for distributed systems", batteries not included.
● Requires a framework○ Marathon (like ASG for Mesos)○ Chronos (Periodic tasks)
● Supports Docker as task executor
Scheduling in PaaSTA:Mesos and Marathon
30
Marathon
● Run N copies of Docker image● Works with Mesos to find space on
cluster● Replaces dead instances
31
from http://mesos.apache.org/documentation/latest/mesos-architecture/
33
from http://mesos.apache.org/documentation/latest/mesos-architecture/
from http://mesos.apache.org/documentation/latest/mesos-architecture/
34
from http://mesos.apache.org/documentation/latest/mesos-architecture/
(Marathon)
(Docker)
from http://mesos.apache.org/documentation/latest/mesos-architecture/
35
from http://mesos.apache.org/documentation/latest/mesos-architecture/
(Marathon)
(Docker)
from http://mesos.apache.org/documentation/latest/mesos-architecture/
36
from http://mesos.apache.org/documentation/latest/mesos-architecture/
(Marathon)
(Docker)
from http://mesos.apache.org/documentation/latest/mesos-architecture/
37
from http://mesos.apache.org/documentation/latest/mesos-architecture/
(Marathon)
(Docker)
Building Docker images
39
● Jenkins builds and tests images● Bless images by creating git tags
○ 1:1 git commit <-> docker image
● Pushes to registry
Shipping Docker images
40
● Distribution via private registry● S3 bucket shared among all
environments
from http://mesos.apache.org/documentation/latest/mesos-architecture/
codemetadata
stagebuild prod
41
Aside: Declarative Control
43
● Describe end goal, not path● Helps achieve fault tolerance
"Deploy 12abcd34 to prod"vs.
"Commit 12abcd34 should be running in prod"
Gas pedal vs. Cruise Control
Configuring Marathon
44
● Need a wall around Marathon: it has root on your entire cluster.
● Cron job
● Combines per-service config and currently-blessed docker image
marathon-$cluster.yaml
45
● # tasks
● CPU, memory
● How to healthcheck your service
● Bounce strategy
● Command / args
Discovery in PaaSTA:SmartStack● Registration agent on each box
writes to ZooKeeper
● Discovery agent on each box reads from ZK, configures HAProxy
48
Registering with SmartStack
50
● configure_nerve.py queries local mesos-slave API
● Keeping it local means registration works even if Mesos master or Marathon is down.
● We can register non-PaaSTA services as well
from http://mesos.apache.org/documentation/latest/mesos-architecture/
hacheck
service_1
service_2
service_3
Service host
ZK configure_nerve.py
nerve
metadatahealthcheck
Architecture: Registration
51
Nerve registers service instance in ZooKeeper:
/nerve/region:myregion ├── service_1 │ └── server_1_0000013614 ├── service_2 │ └── server_1_0000000959 ├── service_3 │ ├── server_1_0000002468 │ └── server_2_0000002467 [...]
from http://mesos.apache.org/documentation/latest/mesos-architecture/
{ "host":"10.0.0.123", "port":31337, "name":"server_1", "weight":10,}
ZooKeeper Data
52
Normally hacheck acts as a transparent proxy for healthchecks:$ curl -s yocalhost:6666/http/service_1/1234/status{ "uptime": 5693819.315988064, "pid": 2595160, "host": "server_1", "version": "b6309e09d71da8f1e28213d251f7c",}$
hacheck
53
Can also force healthchecks to fail before we shut down a service$ hadown service_1$ curl -s yocalhost:6666/http/service_1/1234/statusService service_1 in down state since 1443217910: krall$
hacheck
54
HAProxy● By default, bind to 0.0.0.0● Bind only to yocalhost on public-
facing servers● Gives us goodies for all clients:○ Redispatch on conn failure○ Easy request logging○ Rock-solid load balancing
57
yocalhost
58
● One HAProxy per host
● What address to bind HAProxy to?
● 127.0.0.1 is per-container
● Add loopback address to host: 169.254.255.254
● This also works on servers without Docker
docker container 2
lo 127.0.0.1
eth0 169.254.14.18
docker container 1
yocalhost
59
lo 127.0.0.1
eth0 169.254.14.17
docker0 169.254.1.1
eth0 10.1.2.3
haproxy
lo 127.0.0.1
lo:0 169.254.255.244
Monitoring a PaaS is different
63
● Things can change frequently
○ Which boxes run which services?
○ What services even exist?
● Traditional "host X runs service Y" checks don't work anymore.
Monitor the invariants
64
● N copies of a service are running
● Marathon running on X,Y,Z
● All nodes are running mesos-slave, synapse, nerve, docker
● Cron jobs have succeeded recently
Sensu monitoring
65
● Decentralized checking
● Client executes checks, puts results on a message queue
● Sensu servers handle results from the queue, route them to email, PagerDuty, JIRA, etc.
try:
something that might fail
except:
send failure event
else:
send success event
We can send our own events
66
App-Infra boundaryPermissive enough for developers to do their
job, strict enough to prevent infrastructure from ballooning
69
The right abstractions can save you a lot of work if you need to swap components
Between infra components
70
Iterative improvements find local optima
Sometimes you need to take bigger risks to get bigger rewards
"Evolution versus Revolution"
71
● It's open source now!
● More polish, docs, examples
● Support more technologies
○ Chronos in-progress
○ Docker Swarm?
○ Kubernetes?
What's next for PaaSTA?
72
Thank you!Evan Krall@[email protected]