Visible Ops: Building Effective & Auditable ITIL Change Management Processes in 4 Steps: Phase One

  • Upload
    anana

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Visible Ops: Building Effective & Auditable ITIL Change Management Processes in 4 Steps: Phase One. Gene Kim, CTO, Tripwire, Inc. October 27, 2004. The Challenges. How do I simultaneously contain costs, improve security and service levels, and address regulatory compliance? - PowerPoint PPT Presentation

Citation preview

  • Visible Ops:Building Effective & Auditable ITIL Change Management Processes in 4 Steps:Phase OneGene Kim, CTO, Tripwire, Inc. October 27, 2004

  • The Challenges How do I simultaneously contain costs, improve security and service levels, and address regulatory compliance?What is my first step in building an ITIL change management process? How will I know that its working?What order should I tackle the ITIL process areas? How do I attest to auditors that I have effective change management processes?Sarbanes-Oxley Section 404HIPAA, GLBA, CFR11a, etc.How do COBIT and ITIL fit together?How do I create a good working relationship with my auditors?What do auditors doing controls-based auditors look for?What happens if they cannot find effective controls?

  • AgendaExamine the high-performing IT operations and security organizationsWhat they all have in commonWhat we can learn from themDefine the ideal working relationship between IT and auditWhy auditors talk the way they doWhat auditors need to seeBuilding auditable and effective change management processes in four steps:Stabilize PatientCatch & Release and Find Fragile ArtifactsEstablish Repeatable Build Library Enable Continuous Improvement

  • The Highest Performing IT OrganizationsHigh performance Ops and Security organizations have:

    Highest ratio of staff deployed on pre-production processesLowest amount of unplanned workHighest change success rateBest posture of compliance and security

  • Common Process Areas Of High PerformersAll the high-performers had self-derived the same way of workingCulture of change managementCulture of causalityCulture of compliance and desire to continually reduce variance

  • Common Traits Of The Highest PerformersCulture of change managementIntegration of IT operations and security processes via problem management and change management processesProcesses that serve both organizational needs, as well as business objectivesHighest rate of effective change (approved changes, change success rate)Culture of causalityHighest service levels (MTTR, MTBF)Highest first fix rate (unneeded rework)Culture of compliance and continual reduction of operational varianceProduction configurationsHighest level of pre-production staffingEffective pre-production controlsEffective pairing of preventive and detective controls

  • Causal Factors of IT DowntimeOperator Error60%System Outages20%Application Failure20%SecurityRelatedNon- SecurityRelated5%15%Source: IDC, 2004

  • Capability Levels4 - Continuously Improving

  • Why Auditors Do The Things They DoGiven enough time and resources, auditors would love to count all the beansGo into the warehouses, open up all the containers, and inspect all the contentsRarely does this actually happen, for obvious reasonsInstead, auditors go to the bean counting machine to see whether the results are trustworthyWhat controls ensure that it hasnt been subverted?What controls ensure that the results are correct?For a variety or reasons, auditors are shifting from substantive audits to control audits

  • IT Controls 101Preventative ControlsSeparation of dutiesChange management and authorization processesDetective ControlsProduction controls around change management and configuration managementCorrective ControlsRestoration and backup systems

  • Ideal Attestation of ControlsHigh performing shops typically have the highest service levels and the lowest cost of controlsBest service levels (MTTR, MTBF), lowest amount of unplanned, unscheduled work, highest server/sysadminBest working relationship with audit. Least amount of time dedicated to compliance activitiesWhy?They can point to their change management and governance process (preventative controls)They can show that the processes are working (detective controls)How?Change management meeting minutesThree-ring binder of change orders and verified changes

  • COBIT and Change Management

  • COBIT AI6: Managing Changes

  • COBIT DS9: Managing The Configuration

  • The Tragic Truth About AuditorsAuditors gravitate to where controls appear weakestTo attract the attention of auditors, have unexplained outages and lots of unexplained changesThe top leading indicators of risk when we look at an IT operation are: poor service levels and unusual velocity of changes. Bill Philhower

  • Visible Ops: Four Steps To Build An Effective Change Management ProcessEach of the four Visible Ops steps is:A finite project: not a ISO 9001 initiative or a vague 5-year visionCatalytic: returns more resources to the organization than it consumes, fueling the next stepsSustaining: process stays in place, even when the initial force behind it disappearsAuditable: supports factual reporting and attestation to process adherence and consistencyOrdered: must be done in the specified order to achieve the aboveModel based on five years studying high-performing IT Ops and Security organizationsVisible Ops has been donated to the ITPI

  • Visible Ops: Four Steps To Build An Effective Change Management ProcessPhase 1: Electrify Fence, Modify First ResponsePhase 2: Catch and Release, Find Fragile ArtifactsPhase 3: Establish Repeatable Build LibraryPhase 4: Continually improveTripwire enforces the change process.Tripwire rules out change as early as possible in the repair cycle.Tripwire protects fragile artifacts.Tripwire enforces change freeze and prevents configuration drift.Tripwire captures known good state in preproduction.Tripwire captures production changes that need to be baked into the build.Tripwire detects change, which all process areas hinge upon.

  • Phase 1: Stabilize Patient, Modify First ResponseTripwire and IP ServicesPhase 1: Stabilize Patient, Modify First Response

  • IssuesWe have a tendency to light and fight our own fires80% of outages are self-inflicted80% of MTTR is dominated by asking what changed?With sufficiently low change success rate, high rate of change, and high MTTR, we are spending all our time doing unplanned, unscheduled workBest in class: 5% of OpEx is spent on unplanned workAverage: estimated around 25-45%Changes are made without authorization, proactive scheduling, or full documentation"The most likely way the world will be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents." Nathaniel Borenstein

  • Stabilize PatientCurb the major cause of outages: 80% of outages are self-inflictedIdentify critical patients, clear everyone away from them unless they are authorized to operateDocument this new change policy: no changes unless authorized (preventative)At this point, anyone even holding a scalpel should be viewed with suspicion

  • Electrify The FenceWe have now prescribed our first preventative change process and policyWhy do most change management initiatives fail?What is the top audit finding around change controls?Now we must manage by fact instead of manage by belief by electrifying the fencesNo one is allowed to be inside the change fence except on the weekendsWhy did Joe Bob touch the fence on Monday at 2:11am?Document what should happen to Joe Bob:Public shaming, take a day off, or moreWhat is often overlooked is that if one person can single-handedly save the ship, that one person can probably single-handedly sink the ship, too. -- Unknown

  • Create Change TeamGet all necessary stakeholders who can best make decisions about changes, encompassing business goals, operational risks, technical risks, etc.Key stakeholders for us Security Lead, Ops Systems Engineering Lead, VP of Operations, Service Desk Manager, Director of Network Operations, and Internal AuditCreate weekly change management meetings mandatory for all CAB members.

  • Hold Weekly Change Management MeetingsCreate a path from desired change, to requested change, authorized change, scheduled change, implemented change, verified change.Review implemented changes and ensure that all actual changes mapped to authorized workEnable highest change throughput for the organization, best serve business needs, with the least amount of bureaucracy possibleWeekly 15 min change management meetings are possible, with practiceKeep good records of requested changes, authorized changes, and scheduled changes

  • Change Management GuidelinesDont:Dont authorize changes that do not have rollback plans that everybody reviewsDont allow rubber stamping approval of changesDont let any system changes off the hook someone made it, so understand what caused itDo:Do post-implementation reviews to determine whether the change succeeded or notDo track the change success rateDo use the change success rate to avoid making historically risky changesIts not the strongest species that survive, nor the most intelligent but the one most responsive to change. Charles Darwin

  • Spectrum: Managing ChangeDont expect to be doing closed loop change management right out of the chute awareness is better than being oblivious, managed is better than unmanaged!SpectrumOblivious to change: "Hey, did the switch just reboot?"Aware of change: "Hey, who just rebooted the switch?"Announcing change: "Hey, I'm rebooting the switch. Let me know if that will cause a problem."Authorizing change: "Hey, I need to reboot the switch. Who needs to authorize this?"Scheduling change: "When is the next maintenance window - I'd like to reboot the switch then?"Verifying change: "Looking at the fault manager logs, I can see that the switch rebooted as scheduled."This is what SO-404 requires! (Preventative and detective controls)

  • Create Trusted Authorized Work Queue and Change CalendarCreate a work ticketing system that contains all the authorized work that went through the change management processCreate a change calendar (Forward Schedule Of Change) that the change manager uses to coordinate resources, manage risks, etc.

  • Modify First Response (1/2)The key to a catalytic change management process is that it must return value back to the organizationDecrease MTTR, dominated by 80% where people ask what changed? by integrating change management process into problem managementWhenever problem managers are mobilized, have all authorized changes and actual changes in the work ticketThe Microsoft MOF study showed that their best in class customers rebooted their servers 20x less often, and also had 5x fewer blue screens of death.

  • Modify First Response (2/2)Eliminate change as early as possible by identifying the assets directly involved in the ticket and auditing them against their configuration baseline for the last 72 hours. All changes found are attached to the ticket.If no changes are found the circle is widened to include changes made to infrastructure supporting the target systems.Grant me the Serenity to accept the things I can not change, Courage to change the things I can, and Wisdom to know the difference. Dr. Reinhold Niebuhr (excerpt from the Serenity Prayer)

  • Phase 1: What You Have BuiltDocumented correct path from desired change to authorized change, scheduled change, implemented change, and verified changeCreated documentation that the process is workingReturning value back to IT Ops by reducing MTTR, increasing change success rate and effective change throughput

  • What To Show The SO-404 TeamsChange governance and management processesMeeting minutes of the change management meetingsAuthorization processesThree ring binder of stapled items:Authorized work orderChange report on infrastructure showing correct changes madeSignature of change manager verifying correct implementation of change

  • What To Show The AuditorsList of all outages and unscheduled downtimeChange management metricsChange rate (per week)Change success rateMTTR, MTBFThis would make most auditors breathe a sign of relief

  • Visible Ops: Four Steps To Build An Effective Change Management ProcessPhase 1: Electrify Fence, Modify First ResponsePhase 2: Catch and Release, Find Fragile ArtifactsPhase 3: Establish Repeatable Build LibraryPhase 4: Continually improveTripwire enforces the change process.Tripwire rules out change as early as possible in the repair cycle.Tripwire protects fragile artifacts.Tripwire enforces change freeze and prevents configuration drift.Tripwire captures known good state in preproduction.Tripwire captures production changes that need to baked into the build.Tripwire detects change, which all process areas hinge upon.

  • Which Metric Do You Want To Improve?ReleaseTime to provision known good build# turns to a known good buildShelf life of build% of systems that match known good build% of builds that have security sign-off# of fast-tracked buildsRatio of release engineers to sysadminsControls# of changes authorized per week# of actual changes made per weekChange success rate# of emergency changes# of service-affecting outages# of special changes# of business as usual changesChange management overheadConfiguration varianceResolutionMTTR, MTBF% of time spent on unplanned work

    Phase 4

  • # of production changesfailed change % or unauth changesmean time to repair% of time spent on unplanned workXX=Average: 35-45% of OpEx spent on unplanned work! Impact: late projects, rework, compliance issues, uncontrolled variance, etcWhy Is Unplanned Work Such A Good Indicator?

  • # of production changesfailed change % or unauth changesmean time to repair% of time spent on unplanned workXX=Behaviors that increase change success rate: Effective change testing Effective risk review when approving changes Effective identification of change stakeholders Effective change scheduling

    Behaviors that reduce unauthorized changes: Culture of change management Management ownership of change process Effective monitoring of infrastructure with detective controls to enforce change process Management use of corrective action when change processes are not followed

    Behaviors that decrease MTTR: Culture of causality: desire to rule out change first in problem repair cycle Effective change management process that can report on authorized and scheduled changes Ability to distinguish planned and unplanned outage events Effective communications around scheduled changes Effective monitoring of infrastructure for production changes

    What Affects These Variables?

  • What Do These Transformations Look Like?ExamplesJoe Judge at AderoKen Larson at Schlumberger-SEMAKevin Behr at IP ServicesFinancial returns of process transformationsIncreased availability and decreased MTTRReduction of unplanned work from 50% to 5% of OpExIncreased delivered capacity by 2x with 10% increase in OpExIncreased delivery of planned projects that deliver higher value to the businessFulfilled compliance and reduced cost of compliance

  • Why Do Auditors Love Continuous Improvement?Controls are owned by the business to meet business objectives! Instead of there only to make auditors happy!Auditors hate dragging organizations to implement controls, especially if creates grudging and literal interpretations of findingsContinuous improvement requires process and controls, to detect and reduce variance

  • ITIL and COBITITIL defines the set of all IT operational processes COBIT defines all the controls that can be wrapped around themITIL and COBIT are complementary and orthogonal:Six Sigma defines how to build processes and their corresponding controls to continually monitor and reduce varianceITIL defines the change management processesCOBIT defines the controls to ensure that the ITIL processes are auditable and effective

  • Caught in the Crossfire of ChangeRate of change is increasing with no signs of slowingSarbOx, GLBA, CISP, etc.Distributed systemsHeterogeneous environmentsService levelsRisk mitigationBusiness objectivesQuality improvementStaffing & Budgets

  • Getting Control of ChangeControl frameworks prescribe internal controls to enhance operational performance, security, and regulatory complianceCOBIT, ITIL, ISO17799, SAS70PreventiveCorrectiveChange ManagementDetective

  • Tripwire Change Auditing SolutionsActual changes are detected on production systems and reconciled with approved and intended changesChange auditing results then flow back to change tickets, trouble tickets, audit and mgmt reports, plus configuration mgmt databases (CMDB)

  • Can You Answer These Questions?Pick any piece of your infrastructure (router, server, firewall, etc.)If a change is made to this device, how will you know?How soon will you know?How will you know if the change is good or bad?How long will that process take?What happens when the change is good?What happens when the change is bad?How do you verify that each change has been reconciled?How do you report on all of the above?Can you provide a historical report accounting for all changes in your environment?This is what auditors want to know about how changes are managed in your IT infrastructureWith Tripwire, you can answer all of these questions

  • Improving Service Quality And AvailabilityProblem: Change management in place, but lacked enforcementSaw changes occurring, but didnt have the means to validateCustomer: IT Services operations of a Major Energy Services companyTripwire solution:Tripwire detects change and puts teeth in the processTracking What, When, Who, How and Why a change was madeTripwire provides black and white documentation to enforce processIncreased staff efficiency, uptime, and service qualityWe used to spend 45% to 50% of our time on unplanned work. Now its around 5%.In spite of force reductions, customers describe our services as phenomenally better now.

  • Get Involved!Join ICOPL (ITPI Community Of Practice List-Serv)http://www.itpi.org/home/icopl.php There is now a Visible Ops Pocket Guide!http://www.itpi.org/home/visibleops.php We are looking for volunteers to help with our research projects.IMCA is now online at the ITPIhttp://www.itpi.org/home/imca.php If you have a high performing organization, we want to study you!

  • SummaryControl is possible. We merely need to look at the high-performing IT organizations to confirm this. Transformation is possible. Visible Ops is the result of years of studying high-performing IT operations and security organizations in conjunction with the ITPI Visible Ops illustrates how interested organizations might replicate the processes of these high-performing organizations in just four, achievable steps Gene Kim: [email protected]

    This page intentionally left blank.Kevin Behr - Integrating Controls and Process ImprovementThis page intentionally left blank.Kevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementThis page is intentionally left blank.Kevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementThis page intentionally left blank.Kevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementThis page intentionally left blank.

    Kevin Behr - Integrating Controls and Process ImprovementKevin Behr - Integrating Controls and Process ImprovementICOPL-List Serv Instructions:

    To subscribe: send a blank (subject and body are ignored) e-mail to [email protected] to subscribe. To unsubscribe: send a blank e-mail to [email protected] to unsubscribe. To email the list, send email to [email protected]. Kevin Behr - Integrating Controls and Process ImprovementGoto http://www.itpi.org for more information! Kevin Behr - Integrating Controls and Process Improvement