Upload
selena-deckelmann
View
1.729
Download
6
Embed Size (px)
DESCRIPTION
Citation preview
Some Conference
SCALE 10x
LCA 2012
Mistakes were madeSelena Deckelmann
[email protected]/IRC: @selenamarie
LCA 2012
Failure
LCA 2012
“Prevention”“Risk management”
“Risk mitigation”“MTBF, MTTR”
“Success Engineering”
LCA 2012
Plan for the worst.Minimize risk.Fail.Recover, gracefully.
LCA 2012
“We don’t need a risk management plan,” he emphatically stated, “because this project can’t be allowed to fail.”
- Jim Hightower, http://jimhighsmith.com/2012/01/09/can-do-thinking-makes-risk-
management-impossible/
SCALE 10x
LCA 2012
LCA 2012
Failure is an option.
LCA 2012
SCIENCE
LCA 2012
Dr. Jerker Denrell
SCALE 10x
LCA 2012
SCALE 10x
LCA 2012
SCALE 10x
LCA 2012
LCA 2012
"I think getting two accidents of this type at the same time
is a freak occurrence."-David Cunliffe, NZ Communications Minister
SCALE 10x
LCA 2012
SCALE 10x
LCA 2012
“Further damage was incurred on Tuesday afternoon and our engineers returned to repair the damage,” said Virgin Media.
LCA 2012
Plan for when things fail.
SCALE 10x
LCA 2012
SCALE 10x
LCA 2012
Some Conference
SCALE 10x
LCA 2012
Tales of failure to...
Document
Test
Verify
Imagine
Implement
SCALE 10x
LCA 2012
Failure to document.
SCALE 10x
LCA 2012
Moving Day
Thanks, David Prior!
Some Conference
SCALE 10x
LCA 2012
Prevent documentation failures.
• Write documentation.
• Update documentation.
• Make documenting a step in your written process.
• Assign a fixed amount of time to that step.
Some Conference
SCALE 10x
LCA 2012
Documentation tools
• Graphic designers. (Pretty wikis. Pretty docs. (Sphinx?) Diagrams.)
• Timelines.
• Bug tracking.
• Ordered todo lists.
SCALE 10x
LCA 2012
Failure to test.
LCA 2012
“My first day posing as a sysadmin (~1990, no previous training....) I deleted all zero length files on a Sun workstation.”
Some Conference
SCALE 10x
LCA 2012
Prevent testing failures.
• Verify success criteria.
• Write tests.
• Test with a buddy.
• Have a plan.
Some Conference
SCALE 10x
LCA 2012
Testing tools
• Your favorite test framework
• Repeatable shell scripts
• Staging environments
SCALE 10x
LCA 2012
Failure to verify.
LCA 2012
“What does ‘-d’ actually do?”
Some Conference
SCALE 10x
LCA 2012
Prevent verification failures.
• Have a plan for things going wrong.
• Have a staging environment.
• Test your rollback plan, not just your implementation plan.
Some Conference
SCALE 10x
LCA 2012
Verification tools
• Staging environments
• Your buddy
LCA 2012
Failure to imagine.
LCA 2012
For my group the bottom line was
"don't trust anyone".
Thanks, Maggie!
Some Conference
SCALE 10x
LCA 2012
Recover from failures to imagine.
• Share your stories of failure.
• Talk with people who are different from you.
• Act out implementation scenarios.
LCA 2012
Failure to implement.
Some Conference
SCALE 10x
LCA 2012
Re-implement.
• Learn from mistakes.
LCA 2012
Reflection.(or, the Post-Mortem)
Some Conference
SCALE 10x
LCA 2012
Before
• Plan to do a post-mortem.
• Document the plan with numbered steps and a timeline.
• Test the plan and the rollback plan.
• Identify a “point of no return”.
Some Conference
SCALE 10x
LCA 2012
During
• Screen sharing: UNIX screen, VNC, etc.
• Chatroom: IRC, AIM, Campfire (scrollback!)
• Voice: Campfire, Skype, VOIP, POTS call line
• Headsets!
• Designated time-keeper.
Some Conference
SCALE 10x
LCA 2012
After
• Documentation updates
• Post-mortem to identify areas of success and areas for improvement.
• Limit improvements to 1-2 things.
LCA 2012
Plan for the worst.Minimize risk.Fail.Recover, gracefully.
Some Conference
SCALE 10x
LCA 2012
Thanks!
Some Conference
SCALE 10x
LCA 2012
Mistakes were madeSelena Deckelmann
[email protected]/IRC: @selenamarie
Some Conference
SCALE 10x
LCA 2012
Photo credits
• Flickr: sheepguardingllama