Upload
dc0de
View
221
Download
0
Embed Size (px)
Citation preview
8/3/2019 Building and Operating Successful Data Centers
1/21
8/3/2019 Building and Operating Successful Data Centers
2/21
PRESENTATION GOALS
1. Discuss and define the areas of improvement in our current service transition
methods
2. Suggest methods of refining and closing gaps
3. Identify low-hanging fruit
4. Gain agreement on next steps
8/3/2019 Building and Operating Successful Data Centers
3/21
TERMS FOR THIS PRESENTATION
Service Transition the processes of transitioning services from Design -> Build ->
Run (D->B->R)
Design/Engineering (D) Kevin Matsumotos team
Build (B) Paul Brettons Team
Run (R) Rich Oswalds Team
Service(s) Projects, solutions, pods, zones, architecture, or any other design term
used for any of the projects coming from Design to Build to Run
MTTR Mean Time to Repair/Recovery The average time it takes us to
repair/recover from a production outage or issue(1)
MTBF Mean Time Between/Before Failure(s)(1)
8/3/2019 Building and Operating Successful Data Centers
4/21
MAPPING OUR TEAMS TO ITIL
8/3/2019 Building and Operating Successful Data Centers
5/21
DESIGN TEAMS GOALS
Design for MTBF
High Volume Services
Highly Resilient Services
8/3/2019 Building and Operating Successful Data Centers
6/21
BUILD TEAM GOALS
Build new Services from Design
Maintain Operational Services
Expand existing Services for Business Units
Improve stability in Services
SMEs to support the Run Team
Provide feedback for improvement to Design teams
8/3/2019 Building and Operating Successful Data Centers
7/21
OPERATIONS TEAM GOALS
Recover from Outages
Low MTTR
8/3/2019 Building and Operating Successful Data Centers
8/21
BUILDING BETTER PROCESSES
Adopting better processes can reduce MTTR, which will reduce the impact of outages
to our customers
Designing for MTBF assists us with longevity of our solutions
Its the Jeep v. Rolls Royce scenario
Jeep very low MTTR
Rolls Royce very low MTBF
8/3/2019 Building and Operating Successful Data Centers
9/21
THE FOUR MINUTE JEEP
8/3/2019 Building and Operating Successful Data Centers
10/21
CURRENT SERVICE TRANSITION
PROCESS
Inconsistency, Resourceburn & high MTTR
Run
BuildDesign
8/3/2019 Building and Operating Successful Data Centers
11/21
HOW CAN WE IMPROVE TRANSITIONS?
Build to Run
More formalized handoffs?
Design to Build
8/3/2019 Building and Operating Successful Data Centers
12/21
CURRENT PROCESS
Transitions from D to B and B to O are not currently repeatable process
Too much tribal knowledge
Structure, language and deployment methods change for each build
Handoff of systems is not tracked, scored, or monitored for success/fail or any
other metric
8/3/2019 Building and Operating Successful Data Centers
13/21
WHY IS THIS AN ISSUE?
Systems deployed inconsistently
Causes longer MTTR during troubleshooting
Greater impact to our customers
No accountability of transitions
Embarrasses all of us
8/3/2019 Building and Operating Successful Data Centers
14/21
GAPS IN PROCESS
Transitional processes need to be created for D to B and B to O
Includes design requirements, handoff checklists, handoff plans, formal handoff
meeting and acceptance by a manager/director
Exceptions and flaws will need to be remediated by the current owner
8/3/2019 Building and Operating Successful Data Centers
15/21
SUGGESTED SOLUTIONS
1. Do nothing
2. Identify and build transition teams with skilled staff to manage the transition
process
3. Build internal processes and adopt at least at the Communication Services
level down
8/3/2019 Building and Operating Successful Data Centers
16/21
DO NOTHING
Continued Inconsistency
& high MTTR
The BUs get
fed up?
8/3/2019 Building and Operating Successful Data Centers
17/21
BUILDADEDICATED TRANSITION TEAM
Pros
Dedicated staff to manage transitions
Geared specifically towards
Cons
New staff expenditures
Increased time to delivery of new designs
8/3/2019 Building and Operating Successful Data Centers
18/21
BUILD INTERNAL PROCESSES
Pros
Smoother transitions of projects and builds to operations
Lower MTTR for outages
Greater customer satisfaction
Cons
Change to our existing methods
Will take several iterations to get it right
8/3/2019 Building and Operating Successful Data Centers
19/21
CURRENT TEAM RESPONSIBILITIES
Design
Build
Run
8/3/2019 Building and Operating Successful Data Centers
20/21
END NOTES
(1) Definitions from several resources. See Resources page for more information
8/3/2019 Building and Operating Successful Data Centers
21/21
Definitions IEEE Std 610 - Institute of
Electrical and ElectronicsEngineers, IEEE StandardComputer Dictionary: ACompilation of IEEE StandardComputer Glossaries. NewYork, NY: 1990
Definitions from IETF -RFC1208, RFC1980, RFC4949
The Information TechnologyInfrastructure Library
Resources Origins of MTTR and MTBF -
Traditional Reliability -Carnegie Mellon University
Pecht, M.G., Nash, F.R.,Predicting the Reliability ofElectronic Equipment,Proceedings of the IEEE, Vol.82, No. 7, July 1994
Mean Time Between Failure:Explanation and Standards
http://www.kitchensoap.com/2010/11/07/mttr-mtbf-for-most-types-of-f/
DEFINITIONS & RESOURCES