Upload
greg-cockburn
View
202
Download
0
Embed Size (px)
Citation preview
Stories from the trenches
Who am I?
Chief Cloud Officer - Bulletproof
@gergnz
Who else is to blame?
• Qamal Kosim-SatyaputraBCG Digital Ventures
• Stuart GriceGrice Barrett Consulting
• Maciej DrożdżowskiA Grumpy Polish Bloke
What did we build?
• 2 Rancher Clusters• Managing 6 Kubernetes Clusters
• Deployment Tooling• kube-services
• Cluster Management Tooling• combat-wombat, monitoring-
meerkats, rolling-raccoon• Operations Tooling
• PCI-DSS, ISO27001
How did we do it?
• Terraform• Ansible• AWS• Trendmicro Deep Security• Splunk• Vault• Packer• Skeddly• Amazon Linux
• Github• Gitlab• Travis-CI• Bintray• Jenkins• Kafka• Ruby, Python• Docker• Java
Let’s Dig In
AWS Account Separation
Rancher + Kubernetes
Rancher
Demo
What could possibly go wrong?
Amazon Linux/Docker• Docker version change after upgrade
• Slow startup time
• cgroup location
• /var/lib/docker.sock becomes a dir
- name: install docker
yum: name=docker-1.12.6
- --max-concurrent-downloads 128
docker pull <allthethings> (in packer)
- sudo mount –t tmpfs tmpfs /sys/fs/cgroup
sudo sed -i 's|cgroup|sys/fs/cgroup|' /etc/cgconfig.conf
- ¯\_(ツ)_/¯ - never worked this one out
Docker Images
Java Musl-libc DNS
JUST DON’T!!!!!!!!!!!!!!!!
Java Musl-libc DNS Cont.
JAVA_OPTS=-Dsun.net.spi.nameservice.provider.1=default-Dsun.net.spi.nameservice.provider.2=dns,sun-Dsun.net.spi.nameservice.nameservers=<vpc endpoint>
Magical Rancher CNI and KubeDNS goes here
Rancher• RDS resources exhausted
• Host clean up
- Make it bigger, much bigger
- Build combat-wombat
combat-wombat
Kubernetes• T2 instance CPU credit exhaustion
• etcd split-brain
• anti-affinity
• Launch configuration UpdatePolicy
- etcd gets really busy
don’t use T2s, use Cs
- etcdctl cluster-health
disaster
- json embedded inside yaml for beta/alpha features
sleepy-sloth
- rolling-raccoon
sleepy-sloth
rolling-raccoon
What did we learn?SoDD: Stackoverflow Driven Development
If at first you don’t succeed: Double Tap
Requirements/Specifications change: use code to build generalised building blocks
The upgrade loop never stops: Fix bug, incur tech debt, raise issue/PR, wait for next release, upgrade, rinse, repeat.