34
Scaling A Start-up DevOps Team To 10x While Scaling The System 50x Christian Beedgen – Co-Founder & CTO Stefan Zier – Lead Architect DevOpsDays Austin 2014

Scaling A Start-up DevOps Team To 10x While Scaling The System 50x - DevOpsDays Austin 2014

Embed Size (px)

Citation preview

Page 1: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Scaling A Start-up DevOps Team To 10x

While Scaling The System 50x

Christian Beedgen – Co-Founder & CTO

Stefan Zier – Lead Architect

DevOpsDays Austin 2014

Page 2: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Christian Beedgen

– Co-Founder, CTO

– ArcSight, Amazon, …

– No prior experience running production systems

Stefan Zier

– Lead Architect, first engineer

– ArcSight, Amazon,…

– No prior experience running production systems

Intro

2

Page 3: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

3

Scaling

Spreading constructive beliefs and behavior from the few to the many.

Robert I. Sutton

Scaling up Excellence: Getting to More Without Settling for Less

Page 4: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

4

Page 5: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Petabyte scale log management platform

Big Data™, High Velocity, Human Real Time

Distributed

100% in AWS

Service Oriented Architecture

99% in Scala

Run by engineers

The Sumo Logic Service

5

Page 6: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Data Ingest

6

Page 7: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Code Commits, Services

7

Page 8: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Engineering Head Count

Sumo Logic Confidential8

0

10

20

30

40

50

60

Page 9: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

The Challenge

9

Scaling Sumo Logic

– More confidence and uptime

– More operators

– More change

– More services

Page 10: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

10

Page 11: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

DevOps Culture

Spreading Knowledge

Control surfaces

How We Scaled

11

Page 12: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

12

Culture

a shared, learned, system of values, beliefs and attitudes that shapes and influences perception and behavior — an abstract “mental blueprint” or “mental code.”

Page 13: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

One week, 24/7 responsibility for

– Operational decision making

– Alert response

– Deploying the bits

– Configuration changes

Pair of people (primary, secondary)

– Social schedules & travel

– Training

– Relief after a noisy night

Being On Call

13

Page 14: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Sumo on Sumo

– Perfect dog fooding use case

Post mortems

– Drive improvements from incidents

Alerting

– Code I wrote yesterday just woke me up at 4am

Feedback Loops

14

Page 15: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Mandated for PCI compliance

– Change Management Board = Channel on Slack

– Change Request = JIRA ticket

– Audit trail = Paste slack conversation into JIRA

Actually helpful

– Good documentation

– Starts good discussions

– Makes change mindful

Change Management

15

Page 16: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

16

Spreading Knowledge

Page 17: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Tactical

– Daily Standups

– Chat

– Playbooks

Strategic

– Mentoring

– “How the sausage is made” sessions

– Checklists

Spreading Knowledge

17

Page 18: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

18

Page 19: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Playbooks

19

Linked to alert

– GitHub wikis

– URL in alert

Focused on MTTR

– Steps to restore service

– List of Subject Matter Experts to call

Continuously improved

– Boy Scout rule

Page 20: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Culture

Knowledge

Control surfaces

Three Pillars

Sumo Logic Confidential20

Page 21: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Checklists

21

Improve outcomes

– Ensure experts don’t miss any critical steps

– Prevent repeating mistakes

Well designed

– Coherent

– Living documents

– Concise, clear and require specific actions

– Need to be short and well-organized

– Are NOT step-by-step instructions

Page 22: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

22

Page 23: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

23

Page 24: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

DevOps Friendly

24

Control Surfaces matter for scale

– Simplify complex operations

– Consistent view

– Built-in safety

Natural to use

– Easy to learn, discover

Natural to extend

– Every developer

Page 25: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

25

Page 26: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

dsh

26

dsh

– CLI

– Full stack

– Fast

– Safe

– Secure

– Proactive

– Discoverable

Page 27: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Model Driven

27

Creates consistency

Provides guard rails

Deployment

– Cluster

• Instance

– Assembly

Configured at all levels

Page 28: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

28

daemon restart api:p:25,receiver:p:10

Page 29: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

29

Page 30: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

dsh

30

dsh

– Scala

– Model based

– Trivial to extend

– Specific to OUR needs

– Meaningful defaults

– Prevents mistakes

Page 31: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

31

val filter = FilterBuilder.withCluster(“zk”).

withOnlyRunningInstances.build()

val instances = deployment.connect.describeInstances(filter)

instances.par.foreach {

instance =>

val ssh = instance.connectSSH

ssh.execute(“sudo service api restart”)

}

Page 32: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

What would we do differently next time?

32

Upgrade the system less monolithic

Don’t ask UI developers do operations

Clearer guidelines on managers & operations

Page 33: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Next Experiments

33

Divide up big rotation

Bring India development team into rotation

Switch from 24/7 shifts to 12/7

Deploy smaller parts of the system more often

Bring full-time operations people into the mix

Page 34: Scaling A Start-up DevOps Team To 10x  While Scaling The System 50x - DevOpsDays Austin 2014

Thank You!

34

Christian Beedgen

@raychaser

Stefan Zier

@stefanzier

We’re hiring!go.sumologic.com/jobs