Mistakes were made - LCA 2012

Preview:

DESCRIPTION

 

Citation preview

Some Conference

SCALE 10x

LCA 2012

Mistakes were madeSelena Deckelmann

selena@primeradiant.comTwitter/IRC: @selenamarie

LCA 2012

Failure

LCA 2012

“Prevention”“Risk management”

“Risk mitigation”“MTBF, MTTR”

“Success Engineering”

LCA 2012

Plan for the worst.Minimize risk.Fail.Recover, gracefully.

LCA 2012

“We don’t need a risk management plan,” he emphatically stated, “because this project can’t be allowed to fail.”

- Jim Hightower, http://jimhighsmith.com/2012/01/09/can-do-thinking-makes-risk-

management-impossible/

SCALE 10x

LCA 2012

LCA 2012

Failure is an option.

LCA 2012

SCIENCE

LCA 2012

Dr. Jerker Denrell 

SCALE 10x

LCA 2012

SCALE 10x

LCA 2012

SCALE 10x

LCA 2012

LCA 2012

"I think getting two accidents of this type at the same time

is a freak occurrence."-David Cunliffe, NZ Communications Minister

SCALE 10x

LCA 2012

SCALE 10x

LCA 2012

“Further damage was incurred on Tuesday afternoon and our engineers returned to repair the damage,” said Virgin Media.

LCA 2012

Plan for when things fail.

SCALE 10x

LCA 2012

SCALE 10x

LCA 2012

Some Conference

SCALE 10x

LCA 2012

Tales of failure to...

Document

Test

Verify

Imagine

Implement

SCALE 10x

LCA 2012

Failure to document.

SCALE 10x

LCA 2012

Moving Day

Thanks, David Prior!

Some Conference

SCALE 10x

LCA 2012

Prevent documentation failures.

• Write documentation.

• Update documentation.

• Make documenting a step in your written process.

• Assign a fixed amount of time to that step.

Some Conference

SCALE 10x

LCA 2012

Documentation tools

• Graphic designers. (Pretty wikis. Pretty docs. (Sphinx?) Diagrams.)

• Timelines.

• Bug tracking.

• Ordered todo lists.

SCALE 10x

LCA 2012

Failure to test.

LCA 2012

“My first day posing as a sysadmin (~1990, no previous training....) I deleted all zero length files on a Sun workstation.”

Some Conference

SCALE 10x

LCA 2012

Prevent testing failures.

• Verify success criteria.

• Write tests.

• Test with a buddy.

• Have a plan.

Some Conference

SCALE 10x

LCA 2012

Testing tools

• Your favorite test framework

• Repeatable shell scripts

• Staging environments

SCALE 10x

LCA 2012

Failure to verify.

LCA 2012

“What does ‘-d’ actually do?”

Some Conference

SCALE 10x

LCA 2012

Prevent verification failures.

• Have a plan for things going wrong.

• Have a staging environment.

• Test your rollback plan, not just your implementation plan.

Some Conference

SCALE 10x

LCA 2012

Verification tools

• Staging environments

• Your buddy

LCA 2012

Failure to imagine.

LCA 2012

For my group the bottom line was

"don't trust anyone".

Thanks, Maggie!

Some Conference

SCALE 10x

LCA 2012

Recover from failures to imagine.

• Share your stories of failure.

• Talk with people who are different from you.

• Act out implementation scenarios.

LCA 2012

Failure to implement.

Some Conference

SCALE 10x

LCA 2012

Re-implement.

• Learn from mistakes.

LCA 2012

Reflection.(or, the Post-Mortem)

Some Conference

SCALE 10x

LCA 2012

Before

• Plan to do a post-mortem.

• Document the plan with numbered steps and a timeline.

• Test the plan and the rollback plan.

• Identify a “point of no return”.

Some Conference

SCALE 10x

LCA 2012

During

• Screen sharing: UNIX screen, VNC, etc.

• Chatroom: IRC, AIM, Campfire (scrollback!)

• Voice: Campfire, Skype, VOIP, POTS call line

• Headsets!

• Designated time-keeper.

Some Conference

SCALE 10x

LCA 2012

After

• Documentation updates

• Post-mortem to identify areas of success and areas for improvement.

• Limit improvements to 1-2 things.

LCA 2012

Plan for the worst.Minimize risk.Fail.Recover, gracefully.

Some Conference

SCALE 10x

LCA 2012

Thanks!

Some Conference

SCALE 10x

LCA 2012

Mistakes were madeSelena Deckelmann

selena@primeradiant.comTwitter/IRC: @selenamarie

Some Conference

SCALE 10x

LCA 2012

Photo credits

• Flickr: sheepguardingllama