Upload
sonatype
View
157
Download
0
Embed Size (px)
Citation preview
The Unrealized Role of:
Monitoring & Alerting@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
THE UNREALIZEDROLE OF:
Monitoring& Alerting
@jasonhand | VictorOps | #AllDayDevOps
JASONHAND
DevOps Evangelist
VictorOps@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
2015MONITORING
SURVEY@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
WHY ARE YOU COLLECTING THIS DATA?NOTE: You may choose more than one▸ Performance analysis and trending▸ Fault and Anomaly detection▸ Capacity Planning▸ A/B Testing
@jasonhand | VictorOps | #AllDayDevOps
THE RESULTSNOTE: Respondents may have chose more than one▸ Performance analysis and trending - 63%▸ Fault and Anomaly detection - 53%▸ Capacity Planning - 45%▸ A/B Testing - 11%
@jasonhand | VictorOps | #AllDayDevOps
Tyranny of the
S.L.A.(Service Level Agreement)
@jasonhand | VictorOps | #AllDayDevOps
HIGHAVAILABILITY
Prediction & Prevention@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
THAT'S IMPORTANT
... BUT ...@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
BUSINESSOBJECTIVES?
@jasonhand | VictorOps | #AllDayDevOps
HAPPY CAMPER@jasonhand | VictorOps | #AllDayDevOps
CUSTOMERSwant more than just
99.999% UPTIME@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
WHERE'S THE
INNOVATION?@jasonhand | VictorOps | #AllDayDevOps
HOW IMPORTANT IS
Learning & Innovation?@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
The result of underutilizing monitoring & alertingis that the IT department and the organization have
no chance to...
LEARN,IMPROVE, ORINNOVATE.
@jasonhand | VictorOps | #AllDayDevOps
CONTINUALLY UNDERSTANDING & RESPONDING TO THE FEEDBACK
from
monitoring, logging, & alertingallows you to use information about events in the past to drive future
actions.
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
It's not just about
PREDICTION& PREVENTION
@jasonhand | VictorOps | #AllDayDevOps
RESPOND &REPAIR
...QUICKLY@jasonhand | VictorOps | #AllDayDevOps
NOPE
@jasonhand | VictorOps | #AllDayDevOps
MTTRRather Than
MTBF@jasonhand | VictorOps | #AllDayDevOps
FAILURE ISINEVITABLE
@jasonhand | VictorOps | #AllDayDevOps
US·ER/ˈYOOZƏR/
DISTRIBUTED FAULT INJECTION TEST SUITE FOR PRODUCTION.
credit: Leon Fayer (@papa_fire)@jasonhand | VictorOps | #AllDayDevOps
SUCCESSis a result of
FAILURE@jasonhand | VictorOps | #AllDayDevOps
UNDERSTAND
LEARNINNOVATE
@jasonhand | VictorOps | #AllDayDevOps
RE·SIL·IENT/RƏˈZILYƏNT/
The ability to resist, absorb, recover from or successfully adapt to adversity or a change in conditions
@jasonhand | VictorOps | #AllDayDevOps
CHANGEcan cause failure
but innovation requires
CHANGE
@jasonhand | VictorOps | #AllDayDevOps
CONFLICT@jasonhand | VictorOps | #AllDayDevOps
CHANGEREQUIRED
@jasonhand | VictorOps | #AllDayDevOps
Without deviation from the norm, progress is not possible
— Frank Zappa
@jasonhand | VictorOps | #AllDayDevOps
What Did You
LEARNFrom the Recovery Efforts?
(including monitoring & alerting)@jasonhand | VictorOps | #AllDayDevOps
POSTMORTEMS / LEARNING REVIEWS:Stories of:
WHAT TOOK PLACEleading up to & during
the disruption & recovery efforts@jasonhand | VictorOps | #AllDayDevOps
WHO WASINVOLVED?
@jasonhand | VictorOps | #AllDayDevOps
WHAT DID THEY
SEE?@jasonhand | VictorOps | #AllDayDevOps
WHAT WAS
SAID?@jasonhand | VictorOps | #AllDayDevOps
WHAT
ACTIONSWERE TAKEN?
jhand.co/chatopsbook
@jasonhand | VictorOps | #AllDayDevOps
HOW DOevents & actions
CORRELATEOVER TIME?
@jasonhand | VictorOps | #AllDayDevOps
5 Why's@jasonhand | VictorOps | #AllDayDevOps
5 Why's@jasonhand | VictorOps | #AllDayDevOps
WHAT IS THE "cause"OF THE PROBLEM?
Root Cause is ...
@jasonhand | VictorOps | #AllDayDevOps
OUR...
obsession with
"Root Cause"@jasonhand | VictorOps | #AllDayDevOps
ASKING "WHY".. leads to ..
BLAME@jasonhand | VictorOps | #AllDayDevOps
BLAMINGLEADS TO..
operators hiding relevant & important information
@jasonhand | VictorOps | #AllDayDevOps
We must
BELIEVEthat our operators are doing their best given the
constraints of the "system"@jasonhand | VictorOps | #AllDayDevOps
"We are here to"
LEARNFrom Failure
(and success)@jasonhand | VictorOps | #AllDayDevOps
RATHER THAN ..@jasonhand | VictorOps | #AllDayDevOps
AVOIDFAILURE
@jasonhand | VictorOps | #AllDayDevOps
WHAT'S THE
STORY?@jasonhand | VictorOps | #AllDayDevOps
INNOVATELearning from both success & failure
to develop & implementsmall incremental improvements
is critical.@jasonhand | VictorOps | #AllDayDevOps
MONITORING &ALERTINGHelps us understand the story in greater detail
@jasonhand | VictorOps | #AllDayDevOps
LEARNINGORGANIZATION
@jasonhand | VictorOps | #AllDayDevOps
Learning does NOT come from
READING&
LISTENING@jasonhand | VictorOps | #AllDayDevOps
Learning comes from
DOING@jasonhand | VictorOps | #AllDayDevOps
Real Learning comes from:
OBSERVINGORIENTINGDECIDINGACTING
John Boyd's OODA Loop@jasonhand | VictorOps | #AllDayDevOps
Example:
LEARNING TO PLAY THE
DOBRO GUITAR@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
LEARNING
@jasonhand | VictorOps | #AllDayDevOps
WHY?Go from knowing...to understanding...
to learning
NOTE:(Requires making mistakes)
@jasonhand | VictorOps | #AllDayDevOps
We will trade some uptime in exchange for innovation-Dave Hahn (Netflix)
DevOpsDays Boise 2016@jasonhand | VictorOps | #AllDayDevOps
SHIFT OUR GAZEfrom:
MAINTAINING& PROTECTING
@jasonhand | VictorOps | #AllDayDevOps
LEARNINGWhich leads to...
IMPROVING& INNOVATING
@jasonhand | VictorOps | #AllDayDevOps
WE INCREASE VALUE OF:
- Monitoring & Alerting- IT teams
- Products & Services- Organization
@jasonhand | VictorOps | #AllDayDevOps
HYPOTHESIZEEXPLORESTRETCH
EXPERIMENTFAIL
LEARNTry Again
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
LEARNING & INNOVATINGleads to uncovering new ways of
BUILDING, DEPLOYING, AND MAINTAINING SOFTWARE & INFRASTRUCTURE
Which leads to...@jasonhand | VictorOps | #AllDayDevOps
RESILIENTSYSTEMS
@jasonhand | VictorOps | #AllDayDevOps
The
By-productof a highly
RESILIENTsystem is ...
@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
HIGHLYAVAILABLE
SYSTEM@jasonhand | VictorOps | #AllDayDevOps
THE UNREALIZEDROLE OF:
Monitoring& Alerting is ....
@jasonhand | VictorOps | #AllDayDevOps
LEARNING&
INNOVATION@jasonhand | VictorOps | #AllDayDevOps
THANKYOU
Be Victorious!@jasonhand | VictorOps | #AllDayDevOps
@jasonhand | VictorOps | #AllDayDevOps
References:
Monitoring Survey: https://kartar.net/2015/08/monitoring-survey-2015---metrics/
Firefighter: https://www.learyfirefighters.org/wp-content/uploads/2013/09/cover-slide-1.jpg
Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/Flickr_-_Israel_Defense_Forces_-
_Airplane_Technician,_March_2010.jpgGnome Plan: http://www.nerdfitness.com/wp-content/uploads/2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpgNOC: https://upload.wikimedia.org/wikipedia/commons/0/03/
@jasonhand | VictorOps | #AllDayDevOps
References:
Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/brand_image/b59911fc/
91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpegVW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/
VW_Camper.jpgBlockbuster: https://jordanandeddie.files.wordpress.com/2013/11/
blockbuster-feature.jpgBorders: http://smashingtops.com/wp-content/uploads/2012/06/
borders_logo1.jpg@jasonhand | VictorOps | #AllDayDevOps
Chained Hands: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD
h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F%2Fwww.publicdomainpictures.net%2Fdownload-picture.php
%3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id%3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-
znIW5SCTCUHhqEw&ust=1460926880336203Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/
SMITH.png/revision/latest?cb=20110214092002Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/
@jasonhand | VictorOps | #AllDayDevOps
Accident Free:http://www.compliancesigns.com/media/digital-scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif
Stewie:http://chroniclesofredmark.com/wp-content/uploads/2014/01/
Stewie.gifchange: http://i.imgur.com/EQyC6N3.gif
Hard drive: https://i.imgur.com/pWsKSEf.gifChange: https://farm6.staticflickr.com/
5208/5270199049df99b234e9od.jpgValue: https://d13yacurqjgara.cloudfront.net/users/6437/
screenshots/1405551/value-cropped.gif@jasonhand | VictorOps | #AllDayDevOps