21
Stories from the trenches

Rancher + Kubernetes; Stories from the trenches

Embed Size (px)

Citation preview

Page 1: Rancher + Kubernetes; Stories from the trenches

Stories from the trenches

Page 2: Rancher + Kubernetes; Stories from the trenches

Who am I?

Chief Cloud Officer - Bulletproof

@gergnz

Page 3: Rancher + Kubernetes; Stories from the trenches

Who else is to blame?

• Qamal Kosim-SatyaputraBCG Digital Ventures

• Stuart GriceGrice Barrett Consulting

• Maciej DrożdżowskiA Grumpy Polish Bloke

Page 4: Rancher + Kubernetes; Stories from the trenches

What did we build?

• 2 Rancher Clusters• Managing 6 Kubernetes Clusters

• Deployment Tooling• kube-services

• Cluster Management Tooling• combat-wombat, monitoring-

meerkats, rolling-raccoon• Operations Tooling

• PCI-DSS, ISO27001

Page 5: Rancher + Kubernetes; Stories from the trenches

How did we do it?

• Terraform• Ansible• AWS• Trendmicro Deep Security• Splunk• Vault• Packer• Skeddly• Amazon Linux

• Github• Gitlab• Travis-CI• Bintray• Jenkins• Kafka• Ruby, Python• Docker• Java

Page 6: Rancher + Kubernetes; Stories from the trenches

Let’s Dig In

Page 7: Rancher + Kubernetes; Stories from the trenches

AWS Account Separation

Page 8: Rancher + Kubernetes; Stories from the trenches

Rancher + Kubernetes

Page 9: Rancher + Kubernetes; Stories from the trenches

Rancher

Page 10: Rancher + Kubernetes; Stories from the trenches

Demo

Page 11: Rancher + Kubernetes; Stories from the trenches

What could possibly go wrong?

Page 12: Rancher + Kubernetes; Stories from the trenches

Amazon Linux/Docker• Docker version change after upgrade

• Slow startup time

• cgroup location

• /var/lib/docker.sock becomes a dir

- name: install docker

yum: name=docker-1.12.6

- --max-concurrent-downloads 128

docker pull <allthethings> (in packer)

- sudo mount –t tmpfs tmpfs /sys/fs/cgroup

sudo sed -i 's|cgroup|sys/fs/cgroup|' /etc/cgconfig.conf

- ¯\_(ツ)_/¯ - never worked this one out

Page 13: Rancher + Kubernetes; Stories from the trenches

Docker Images

Page 14: Rancher + Kubernetes; Stories from the trenches

Java Musl-libc DNS

JUST DON’T!!!!!!!!!!!!!!!!

Page 15: Rancher + Kubernetes; Stories from the trenches

Java Musl-libc DNS Cont.

JAVA_OPTS=-Dsun.net.spi.nameservice.provider.1=default-Dsun.net.spi.nameservice.provider.2=dns,sun-Dsun.net.spi.nameservice.nameservers=<vpc endpoint>

Magical Rancher CNI and KubeDNS goes here

Page 16: Rancher + Kubernetes; Stories from the trenches

Rancher• RDS resources exhausted

• Host clean up

- Make it bigger, much bigger

- Build combat-wombat

Page 17: Rancher + Kubernetes; Stories from the trenches

combat-wombat

Page 18: Rancher + Kubernetes; Stories from the trenches

Kubernetes• T2 instance CPU credit exhaustion

• etcd split-brain

• anti-affinity

• Launch configuration UpdatePolicy

- etcd gets really busy

don’t use T2s, use Cs

- etcdctl cluster-health

disaster

- json embedded inside yaml for beta/alpha features

sleepy-sloth

- rolling-raccoon

Page 19: Rancher + Kubernetes; Stories from the trenches

sleepy-sloth

Page 20: Rancher + Kubernetes; Stories from the trenches

rolling-raccoon

Page 21: Rancher + Kubernetes; Stories from the trenches

What did we learn?SoDD: Stackoverflow Driven Development

If at first you don’t succeed: Double Tap

Requirements/Specifications change: use code to build generalised building blocks

The upgrade loop never stops: Fix bug, incur tech debt, raise issue/PR, wait for next release, upgrade, rinse, repeat.