21
Stephen S. Yau Stephen S. Yau CSE465-591, Fall 2006 CSE465-591, Fall 2006 1 Contingency and Contingency and Disaster Recovery Disaster Recovery Planning Planning

Stephen S. Yau CSE465-591, Fall 2006 1 Contingency and Disaster Recovery Planning

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 11

Contingency and Contingency and Disaster Recovery Disaster Recovery

PlanningPlanning

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 22

DefinitionsDefinitionsContingency planning:Contingency planning:– Multifaceted approaches to ensure Multifaceted approaches to ensure

critical system and network assets critical system and network assets remain functionally reliableremain functionally reliable

Disaster recovery planning:Disaster recovery planning:– Describing exact steps and Describing exact steps and

procedures the personnel in key procedures the personnel in key departments must follow in order departments must follow in order to recover critical information to recover critical information systems in case of a disaster that systems in case of a disaster that causes loss of access to systems.causes loss of access to systems.

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 33

Other Security PlanningOther Security PlanningNSTISS-National Security NSTISS-National Security Telecommunications and Information Telecommunications and Information Systems SecuritySystems Security– NSTISS policies: NSTISS policies:

http://www.cnss.gov/policies.htmlhttp://www.cnss.gov/policies.html– NSTISS directives: NSTISS directives:

http://www.cnss.gov/directives.htmlhttp://www.cnss.gov/directives.html– NSTISS training:NSTISS training:

ASU has already received certificates of ASU has already received certificates of courseware for NSTISSI 4011 and CNSSI courseware for NSTISSI 4011 and CNSSI 4012 4012

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 44

Why Do We Need Why Do We Need Planning?Planning?

No 100% secure system. No matter No 100% secure system. No matter how much effort we put in risk how much effort we put in risk management and try to prevent bad management and try to prevent bad things from happening, bad things things from happening, bad things will happen.will happen.

What can we do?What can we do?– Plan for the worstPlan for the worst

Not just how we will restore serviceNot just how we will restore service

Also plan on how we will continue to Also plan on how we will continue to provide service in case of a disasterprovide service in case of a disaster

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 55

When Do We Need When Do We Need Planning?Planning?

Contingency plan/disaster recovery plan is ready Contingency plan/disaster recovery plan is ready to be invoked whenever a disruption to the to be invoked whenever a disruption to the system occurssystem occurs– Need these plans long before a disaster occurs, Need these plans long before a disaster occurs,

usually in parallel with usually in parallel with Risk ManagementRisk Management..Plans should cover for occurrence of at least the Plans should cover for occurrence of at least the following events:following events:– Equipment failureEquipment failure– Power outage or telecommunication network Power outage or telecommunication network

shutdownshutdown– Application software corruptionApplication software corruption– Human error, sabotage or strikeHuman error, sabotage or strike– Malicious software attacksMalicious software attacks– Hacking or other Internet attacksHacking or other Internet attacks– Terrorist attacksTerrorist attacks– Natural disastersNatural disasters

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 66

Plan ComponentsPlan ComponentsA contingency/disaster recovery plan should A contingency/disaster recovery plan should include the following information:include the following information:– Measure disruptive events Measure disruptive events – Contingency plan: response procedures and Contingency plan: response procedures and

continuity of operationscontinuity of operations– Backup requirementsBackup requirements– Plan for recovery actions after a disruptive eventPlan for recovery actions after a disruptive event– Procedures for off-site processingProcedures for off-site processing– Guidelines for determining critical and essential Guidelines for determining critical and essential

workloadworkload– Individual employees’ responsibilities in Individual employees’ responsibilities in

response to emergency situationsresponse to emergency situations– Emergency destructive proceduresEmergency destructive procedures

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 77

Measure Disruptive Measure Disruptive EventsEventsIdentify and evaluate possible disruptive Identify and evaluate possible disruptive

events:events:– Identify most critical processes and Identify most critical processes and

requirements for continuing to operate in requirements for continuing to operate in the event of an emergencythe event of an emergency

– Identify resources required to support Identify resources required to support most critical processesmost critical processes

– Define disasters, and analyze possible Define disasters, and analyze possible damage to most critical processes and damage to most critical processes and their required resourcestheir required resources

– Define steps of escalation in declaring a Define steps of escalation in declaring a disasterdisaster

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 88

Response Procedures and Response Procedures and Continuity of OperationsContinuity of Operations

Reporting proceduresReporting procedures– Internal: notify IA personnel, management and Internal: notify IA personnel, management and

related departmentsrelated departments– External: notify public agencies, media, External: notify public agencies, media,

suppliers and customerssuppliers and customers

Determine immediate actions to be taken Determine immediate actions to be taken after a disaster happensafter a disaster happens– Protection of personnelProtection of personnel– Containment of the incidentContainment of the incident– Assessment of the effectAssessment of the effect– Decisions on the optimum actions to be takenDecisions on the optimum actions to be taken– Taking account of the power of public Taking account of the power of public

authoritiesauthorities

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 99

Backup RequirementsBackup RequirementsCritical data and system files must have Critical data and system files must have backups stored off-site. Backups are used tobackups stored off-site. Backups are used to– Restore data when normal data storage is Restore data when normal data storage is

unavailableunavailable– Provide online access when the main system is Provide online access when the main system is

downdown

Not all data needs to be online or available at Not all data needs to be online or available at all timesall times– Backup takes time and need additional storage Backup takes time and need additional storage

spacespace– Require extra effort to keep backup consistent Require extra effort to keep backup consistent

with normal data storagewith normal data storage– Backup should consider data-production rates Backup should consider data-production rates

and data-loss risk and be cost-effectiveand data-loss risk and be cost-effective

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1010

Backup Requirements Backup Requirements (cont.)(cont.)Decide Decide whatwhat and and how oftenhow often to backup depending to backup depending on risks:on risks:– Immediate losses of servicesImmediate losses of services:: power failure or power failure or

application crash that any data that has not been saved application crash that any data that has not been saved will be lost. If the data is critical, users must be aware will be lost. If the data is critical, users must be aware of this risk and make periodic “saves” themselvesof this risk and make periodic “saves” themselves

– Media lossesMedia losses:: storage media has physical damage and storage media has physical damage and can no longer be read. Need to decidecan no longer be read. Need to decide

How often to do a complete backup?How often to do a complete backup?

Will incremental backups be done between two complete Will incremental backups be done between two complete backups?backups?

What media will be used for backup?What media will be used for backup?

– Archiving inactive dataArchiving inactive data:: recent active data should be recent active data should be put onto a hard disk for fast access, while old inactive put onto a hard disk for fast access, while old inactive data can be archived to tape, CD or DVD in order to data can be archived to tape, CD or DVD in order to free hard disk spacefree hard disk space

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1111

Plan for Recovery Plan for Recovery ActionsActionsHigh-level management must decide what the High-level management must decide what the

organization should do after a disruptive event organization should do after a disruptive event happens. Possible choices:happens. Possible choices:– Do nothingDo nothing: loss is tolerable; rarely happens and : loss is tolerable; rarely happens and

cost more to correct it.cost more to correct it.– Seek for insurance compensationSeek for insurance compensation: provides : provides

financial support in the event of loss, but does not financial support in the event of loss, but does not provide protection for the organization’s provide protection for the organization’s reputation.reputation.

– Loss mitigationLoss mitigation: isolate the damage and try to bring : isolate the damage and try to bring the system back online as soon as possible.the system back online as soon as possible.

– Bring off-site system online for continuous Bring off-site system online for continuous operationoperation: maintain an off-site backup system that : maintain an off-site backup system that will kick in when the a disruptive event has made will kick in when the a disruptive event has made the original system unavailable.the original system unavailable.

Identify all possible choices, including Identify all possible choices, including cost/benefit analysis and present cost/benefit analysis and present recommendations to high-level management recommendations to high-level management for approval.for approval.

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1212

Off-site ProcessingOff-site ProcessingChoices for off-site processing:Choices for off-site processing:– Cold siteCold site: an empty facility located offsite with : an empty facility located offsite with

necessary infrastructure ready for installation of necessary infrastructure ready for installation of back-up system in the event of a disasterback-up system in the event of a disaster

– Mutual backupMutual backup:: two organizations with similar two organizations with similar system configuration agree to serve as a backup system configuration agree to serve as a backup site for each othersite for each other

– Hot siteHot site:: a site with hardware, software and a site with hardware, software and network installed and compatible to production sitenetwork installed and compatible to production site

– Remote journalingRemote journaling:: online transmission of online transmission of transaction data to backup system periodically to transaction data to backup system periodically to minimize loss of data and reduce recovery timeminimize loss of data and reduce recovery time

– Mirrored siteMirrored site: a site equips with a system identical : a site equips with a system identical to the production system with mirroring facility. to the production system with mirroring facility. Data is mirrored to backup system immediately. Data is mirrored to backup system immediately. Recovery is transparent to usersRecovery is transparent to users

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1313

Off-site Processing Off-site Processing (cont.)(cont.)

Mirrored Site

Remote Journaling

Hot SiteMutual Backup

Cold Site

Restore time

Cost

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1414

Decision Factors Decision Factors for Off-site Processingfor Off-site Processing

Availability of facilityAvailability of facility

Ability to maintain redundant Ability to maintain redundant equipmentequipment

Ability to maintain redundant Ability to maintain redundant network capacitynetwork capacity

Relationships with vendors to Relationships with vendors to provide immediate replacement or provide immediate replacement or assistanceassistance

Adequacy of fundingAdequacy of funding

Availability of skilled personnelAvailability of skilled personnel

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1515

Guidelines for Determining Guidelines for Determining Critical and Essential Critical and Essential

WorkloadWorkloadUnderstand system’s mission goalUnderstand system’s mission goal

Identify mission critical processesIdentify mission critical processes

Identify dependencies among different Identify dependencies among different departments/personnel within the departments/personnel within the organizationorganization

Understand influence of external factorsUnderstand influence of external factors– Government agenciesGovernment agencies– CompetitorsCompetitors– RegulatorsRegulators

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1616

Individual Responsibilities Individual Responsibilities in Emergency Responsein Emergency Response

Emergency response planning coordinator: Emergency response planning coordinator: coordinating the following activities:coordinating the following activities:– Establish contingency/disaster recovery plansEstablish contingency/disaster recovery plans– Maintain/modify the plansMaintain/modify the plans– Audit the plansAudit the plans

High-level manager (department manager, VP, etc.)High-level manager (department manager, VP, etc.)– Understand process and mission goal of Understand process and mission goal of

organizationorganization– Monitor contingency/disaster recovery plans and Monitor contingency/disaster recovery plans and

keep plans updatedkeep plans updatedAll other employeesAll other employees– Know contingency/disaster recovery plansKnow contingency/disaster recovery plans– Understand own responsibilities and expectations Understand own responsibilities and expectations

during operationduring operation– Know whom to contact if something not covered Know whom to contact if something not covered

in plan happensin plan happens

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1717

Emergency Destructive Emergency Destructive ProceduresProcedures

Under certain situations, an emergency response Under certain situations, an emergency response may focus on destroying data rather than restoring may focus on destroying data rather than restoring datadata– Physical protection of system is no longer Physical protection of system is no longer

available available – Critical assets (product design documents, list of Critical assets (product design documents, list of

sensitive customer or supplier, etc.) sensitive customer or supplier, etc.)

An emergency destructive plan should containAn emergency destructive plan should contain– Prioritized items that may need to be destroyedPrioritized items that may need to be destroyed– Backup procedure about critical data at a secure Backup procedure about critical data at a secure

off-site locationoff-site location– Specify who has authority to invoke destructive Specify who has authority to invoke destructive

planplan

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1818

Test a Contingency/Disaster Test a Contingency/Disaster Recovery PlanRecovery Plan

Testing is a necessary and essential Testing is a necessary and essential step in planning process:step in planning process:– A plan may look great on paper, but A plan may look great on paper, but

until it is carried out, no one knows until it is carried out, no one knows how it will performhow it will perform

– Testing not only shows the plan is Testing not only shows the plan is viable, but also prepares personnel viable, but also prepares personnel involved by practicing their involved by practicing their responsibilities and removing possible responsibilities and removing possible uncertaintyuncertainty

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1919

Test a Contingency/Disaster Test a Contingency/Disaster Recovery Plan Recovery Plan (cont.)(cont.)

Five methods of testing such a planFive methods of testing such a plan– Walk-throughWalk-through: members of key units : members of key units

meet to trace their steps through the meet to trace their steps through the plan, looking for omissions and plan, looking for omissions and inaccuraciesinaccuracies

– SimulationSimulation: during a practice session, : during a practice session, critical personnel meet to perform critical personnel meet to perform “dry run” of the emergency, “dry run” of the emergency, mimicking the response to true mimicking the response to true emergency as closely as possibleemergency as closely as possible

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 2020

Test a Contingency/Disaster Test a Contingency/Disaster Recovery Plan Recovery Plan (cont.)(cont.)

– ChecklistChecklist:: more passive type of testing, more passive type of testing, members of the key units “check off” the members of the key units “check off” the tasks on list for which they are tasks on list for which they are responsible. Report accuracy of the listresponsible. Report accuracy of the list

– Parallel testingParallel testing:: backup processing backup processing occurs in parallel with production services occurs in parallel with production services that never stop. If testing fails, normal that never stop. If testing fails, normal production will not be affected.production will not be affected.

– Full interruptionFull interruption:: production systems production systems are stopped as if a disaster had occurred are stopped as if a disaster had occurred to see how backup services performto see how backup services perform

Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 2121

ReferencesReferences

M. Merkow, J. Breithaupt, M. Merkow, J. Breithaupt, Information Security: Principles and Information Security: Principles and PracticesPractices,, Prentice Hall, August Prentice Hall, August 2005, 448 pages, ISBN 0131547291 2005, 448 pages, ISBN 0131547291

J. G. Boyce, D. W. Jennings, J. G. Boyce, D. W. Jennings, Information AssuranceInformation Assurance:: Managing Managing Organizational IT Security RisksOrganizational IT Security Risks. . Butterworth Heineman, 2002, ISBN Butterworth Heineman, 2002, ISBN 0-7506-7327-30-7506-7327-3