Devops, Dungeons & Dragons

Preview:

DESCRIPTION

Presentation for Devops downunder, aka devopsdays Sydney 2013. What does it mean to be an expert at the art of operations? How do we learn such things? Can we run firedrills like a D&D campaign?

Citation preview

Devops, Dungeons & Dragons

David Lutz

beginner vs expert

HP 500 ATK 20

Scenario 1

Phone goes off at 3am...

HP 50 ATK 10

#fail because Johnny didn't

● remain cool under pressure● make judgement call on severity of

problem● keep track of time● consider escalating● make a guess to what the problem

might be ● communicate what's going on● preserve forensic evidence

Scenario 2

Phone goes off at 3am...

HP 700 ATK 400

beginner vs expert

instinct or experience?

how do you level up your skills?

SPELL CARD lose 10 HP per turn

Daemon HP 800 ATK 15

How to level up?

SPELL CARD Lose 100 HP

Four Stages for Learning Any New Skill

Noel Burch

SPELL CARD ATK +20 per turn

1. Unconscious incompetence"I don't know what I don't know"

2. Conscious incompetence"I know what I don't know"

3. Conscious competence"I know it, but it's hard"

4. Unconscious competence"I know it so well, I don't need to think about it"

Training and practice

reduce the time between the four stages

HP 100 ATK 70

1. Unconscious incompetence (first 6 days)

2. Conscious incompetence (first 6 weeks)

3. Conscious competence (first 6 months)

4. Unconscious competence (6 months+)

David's rule of thumb for new employees

Learning theories

Adults can learn in abstract ways. Reading about something. Observing someone else doing something.

Children can learn by more direct doing method. Role play.

Daemon HP 40 ATK 25

Role-play, fire drills, wargames are a powerful way to learn things.

SPELL CARD HP +40 ATK +10

Practice dealing with emergencies

Daemon lose a turn HP -5

Responsibilities of devops dungeonmaster

1. Plan the scenario beforehand2. Explain to the rest of the team what's

happened and break stuff3. Monitor the situation and take notes4. Time each event during the scenario5. Postmortem

Pass on knowledge

by doing!

SPELL CARD ATK +20

The rest of the team

1. Identify the problem 2. Resolve the problem as if it was a

real incident at 3am3. Exercise the alerting systems,

monitoring systems, comms systems4. Learn from each other, talk through

what's going on

objective:

Positive outcome for the business

Reduce MTTR

SPELL CARD both lose half ATK

After the firedrill/campaign...

Dungeonmaster runs a retro postmortem style

1. Was monitoring and alerting sufficient?2. Could recovery have been quicker? How?3. Did we uncover any latent faults or unknown

dependencies?4. Involve the developers. For example, could

better kill switches or levers be put in to the apps to aid operating them?

Fantasy Creatures

Or, think about how you want your team structured.

SPELL CARD HP -10 ATK +10

For example:

Party of 4 dwarves wouldn't work well

SPELL CARD 2x ATK/round

Dwarves - Slow, very tough and strong, not very smart, like mining

Wizards - Good at Magic

Elves - Fast, somewhat magical, live in the forest

Humans - Not especially good at anything, but adaptable

Creature attributes

Strength

Speed

Magic ability

Daemon HP 666 ATK 60

Engineer attributes

Programming

Operating Systems

Data Modelling/Management

Networking

Metrics, Troubleshooting

SPELL CARD Roll -10x1d20 HP

Dwarves == Specialists

Wizards == Developers

Elves == Sysadmins

Humans == Generalists

DeveloperProgramming ✭✭✭✭✭

Operating Systems ✭✭

Data Modelling/Management ✭✭✭

Networking ✭Metrics, Troubleshooting ✭✭✭

SysadminProgramming ✭Operating Systems ✭✭✭✭✭

Data Modelling/Management ✭✭

Networking ✭✭✭

Metrics, Troubleshooting ✭✭✭✭

SpecialistsDBAs/Network Engineers/QAextremely high skills in one of● Data Modelling/Management ● Networking ● Metrics, Troubleshooting (and bug finding)

GeneralistsArchitects/AutomatorsWide range of skills, but may not be expert in any area

The end

Questions?

SPELL CARD invulnerability