Building and Operating Successful Data Centers

  • Upload
    dc0de

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Building and Operating Successful Data Centers

    1/21

  • 8/3/2019 Building and Operating Successful Data Centers

    2/21

    PRESENTATION GOALS

    1. Discuss and define the areas of improvement in our current service transition

    methods

    2. Suggest methods of refining and closing gaps

    3. Identify low-hanging fruit

    4. Gain agreement on next steps

  • 8/3/2019 Building and Operating Successful Data Centers

    3/21

    TERMS FOR THIS PRESENTATION

    Service Transition the processes of transitioning services from Design -> Build ->

    Run (D->B->R)

    Design/Engineering (D) Kevin Matsumotos team

    Build (B) Paul Brettons Team

    Run (R) Rich Oswalds Team

    Service(s) Projects, solutions, pods, zones, architecture, or any other design term

    used for any of the projects coming from Design to Build to Run

    MTTR Mean Time to Repair/Recovery The average time it takes us to

    repair/recover from a production outage or issue(1)

    MTBF Mean Time Between/Before Failure(s)(1)

  • 8/3/2019 Building and Operating Successful Data Centers

    4/21

    MAPPING OUR TEAMS TO ITIL

  • 8/3/2019 Building and Operating Successful Data Centers

    5/21

    DESIGN TEAMS GOALS

    Design for MTBF

    High Volume Services

    Highly Resilient Services

  • 8/3/2019 Building and Operating Successful Data Centers

    6/21

    BUILD TEAM GOALS

    Build new Services from Design

    Maintain Operational Services

    Expand existing Services for Business Units

    Improve stability in Services

    SMEs to support the Run Team

    Provide feedback for improvement to Design teams

  • 8/3/2019 Building and Operating Successful Data Centers

    7/21

    OPERATIONS TEAM GOALS

    Recover from Outages

    Low MTTR

  • 8/3/2019 Building and Operating Successful Data Centers

    8/21

    BUILDING BETTER PROCESSES

    Adopting better processes can reduce MTTR, which will reduce the impact of outages

    to our customers

    Designing for MTBF assists us with longevity of our solutions

    Its the Jeep v. Rolls Royce scenario

    Jeep very low MTTR

    Rolls Royce very low MTBF

  • 8/3/2019 Building and Operating Successful Data Centers

    9/21

    THE FOUR MINUTE JEEP

  • 8/3/2019 Building and Operating Successful Data Centers

    10/21

    CURRENT SERVICE TRANSITION

    PROCESS

    Inconsistency, Resourceburn & high MTTR

    Run

    BuildDesign

  • 8/3/2019 Building and Operating Successful Data Centers

    11/21

    HOW CAN WE IMPROVE TRANSITIONS?

    Build to Run

    More formalized handoffs?

    Design to Build

  • 8/3/2019 Building and Operating Successful Data Centers

    12/21

    CURRENT PROCESS

    Transitions from D to B and B to O are not currently repeatable process

    Too much tribal knowledge

    Structure, language and deployment methods change for each build

    Handoff of systems is not tracked, scored, or monitored for success/fail or any

    other metric

  • 8/3/2019 Building and Operating Successful Data Centers

    13/21

    WHY IS THIS AN ISSUE?

    Systems deployed inconsistently

    Causes longer MTTR during troubleshooting

    Greater impact to our customers

    No accountability of transitions

    Embarrasses all of us

  • 8/3/2019 Building and Operating Successful Data Centers

    14/21

    GAPS IN PROCESS

    Transitional processes need to be created for D to B and B to O

    Includes design requirements, handoff checklists, handoff plans, formal handoff

    meeting and acceptance by a manager/director

    Exceptions and flaws will need to be remediated by the current owner

  • 8/3/2019 Building and Operating Successful Data Centers

    15/21

    SUGGESTED SOLUTIONS

    1. Do nothing

    2. Identify and build transition teams with skilled staff to manage the transition

    process

    3. Build internal processes and adopt at least at the Communication Services

    level down

  • 8/3/2019 Building and Operating Successful Data Centers

    16/21

    DO NOTHING

    Continued Inconsistency

    & high MTTR

    The BUs get

    fed up?

  • 8/3/2019 Building and Operating Successful Data Centers

    17/21

    BUILDADEDICATED TRANSITION TEAM

    Pros

    Dedicated staff to manage transitions

    Geared specifically towards

    Cons

    New staff expenditures

    Increased time to delivery of new designs

  • 8/3/2019 Building and Operating Successful Data Centers

    18/21

    BUILD INTERNAL PROCESSES

    Pros

    Smoother transitions of projects and builds to operations

    Lower MTTR for outages

    Greater customer satisfaction

    Cons

    Change to our existing methods

    Will take several iterations to get it right

  • 8/3/2019 Building and Operating Successful Data Centers

    19/21

    CURRENT TEAM RESPONSIBILITIES

    Design

    Build

    Run

  • 8/3/2019 Building and Operating Successful Data Centers

    20/21

    END NOTES

    (1) Definitions from several resources. See Resources page for more information

  • 8/3/2019 Building and Operating Successful Data Centers

    21/21

    Definitions IEEE Std 610 - Institute of

    Electrical and ElectronicsEngineers, IEEE StandardComputer Dictionary: ACompilation of IEEE StandardComputer Glossaries. NewYork, NY: 1990

    Definitions from IETF -RFC1208, RFC1980, RFC4949

    The Information TechnologyInfrastructure Library

    Resources Origins of MTTR and MTBF -

    Traditional Reliability -Carnegie Mellon University

    Pecht, M.G., Nash, F.R.,Predicting the Reliability ofElectronic Equipment,Proceedings of the IEEE, Vol.82, No. 7, July 1994

    Mean Time Between Failure:Explanation and Standards

    http://www.kitchensoap.com/2010/11/07/mttr-mtbf-for-most-types-of-f/

    DEFINITIONS & RESOURCES