View
218
Download
0
Embed Size (px)
Citation preview
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 11
Contingency and Contingency and Disaster Recovery Disaster Recovery
PlanningPlanning
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 22
DefinitionsDefinitionsContingency planning:Contingency planning:– Multifaceted approaches to ensure Multifaceted approaches to ensure
critical system and network assets critical system and network assets remain functionally reliableremain functionally reliable
Disaster recovery planning:Disaster recovery planning:– Describing exact steps and Describing exact steps and
procedures the personnel in key procedures the personnel in key departments must follow in order departments must follow in order to recover critical information to recover critical information systems in case of a disaster that systems in case of a disaster that causes loss of access to systems.causes loss of access to systems.
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 33
Other Security PlanningOther Security PlanningNSTISS-National Security NSTISS-National Security Telecommunications and Information Telecommunications and Information Systems SecuritySystems Security– NSTISS policies: NSTISS policies:
http://www.cnss.gov/policies.htmlhttp://www.cnss.gov/policies.html– NSTISS directives: NSTISS directives:
http://www.cnss.gov/directives.htmlhttp://www.cnss.gov/directives.html– NSTISS training:NSTISS training:
ASU has already received certificates of ASU has already received certificates of courseware for NSTISSI 4011 and CNSSI courseware for NSTISSI 4011 and CNSSI 4012 4012
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 44
Why Do We Need Why Do We Need Planning?Planning?
No 100% secure system. No matter No 100% secure system. No matter how much effort we put in risk how much effort we put in risk management and try to prevent bad management and try to prevent bad things from happening, bad things things from happening, bad things will happen.will happen.
What can we do?What can we do?– Plan for the worstPlan for the worst
Not just how we will restore serviceNot just how we will restore service
Also plan on how we will continue to Also plan on how we will continue to provide service in case of a disasterprovide service in case of a disaster
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 55
When Do We Need When Do We Need Planning?Planning?
Contingency plan/disaster recovery plan is ready Contingency plan/disaster recovery plan is ready to be invoked whenever a disruption to the to be invoked whenever a disruption to the system occurssystem occurs– Need these plans long before a disaster occurs, Need these plans long before a disaster occurs,
usually in parallel with usually in parallel with Risk ManagementRisk Management..Plans should cover for occurrence of at least the Plans should cover for occurrence of at least the following events:following events:– Equipment failureEquipment failure– Power outage or telecommunication network Power outage or telecommunication network
shutdownshutdown– Application software corruptionApplication software corruption– Human error, sabotage or strikeHuman error, sabotage or strike– Malicious software attacksMalicious software attacks– Hacking or other Internet attacksHacking or other Internet attacks– Terrorist attacksTerrorist attacks– Natural disastersNatural disasters
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 66
Plan ComponentsPlan ComponentsA contingency/disaster recovery plan should A contingency/disaster recovery plan should include the following information:include the following information:– Measure disruptive events Measure disruptive events – Contingency plan: response procedures and Contingency plan: response procedures and
continuity of operationscontinuity of operations– Backup requirementsBackup requirements– Plan for recovery actions after a disruptive eventPlan for recovery actions after a disruptive event– Procedures for off-site processingProcedures for off-site processing– Guidelines for determining critical and essential Guidelines for determining critical and essential
workloadworkload– Individual employees’ responsibilities in Individual employees’ responsibilities in
response to emergency situationsresponse to emergency situations– Emergency destructive proceduresEmergency destructive procedures
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 77
Measure Disruptive Measure Disruptive EventsEventsIdentify and evaluate possible disruptive Identify and evaluate possible disruptive
events:events:– Identify most critical processes and Identify most critical processes and
requirements for continuing to operate in requirements for continuing to operate in the event of an emergencythe event of an emergency
– Identify resources required to support Identify resources required to support most critical processesmost critical processes
– Define disasters, and analyze possible Define disasters, and analyze possible damage to most critical processes and damage to most critical processes and their required resourcestheir required resources
– Define steps of escalation in declaring a Define steps of escalation in declaring a disasterdisaster
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 88
Response Procedures and Response Procedures and Continuity of OperationsContinuity of Operations
Reporting proceduresReporting procedures– Internal: notify IA personnel, management and Internal: notify IA personnel, management and
related departmentsrelated departments– External: notify public agencies, media, External: notify public agencies, media,
suppliers and customerssuppliers and customers
Determine immediate actions to be taken Determine immediate actions to be taken after a disaster happensafter a disaster happens– Protection of personnelProtection of personnel– Containment of the incidentContainment of the incident– Assessment of the effectAssessment of the effect– Decisions on the optimum actions to be takenDecisions on the optimum actions to be taken– Taking account of the power of public Taking account of the power of public
authoritiesauthorities
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 99
Backup RequirementsBackup RequirementsCritical data and system files must have Critical data and system files must have backups stored off-site. Backups are used tobackups stored off-site. Backups are used to– Restore data when normal data storage is Restore data when normal data storage is
unavailableunavailable– Provide online access when the main system is Provide online access when the main system is
downdown
Not all data needs to be online or available at Not all data needs to be online or available at all timesall times– Backup takes time and need additional storage Backup takes time and need additional storage
spacespace– Require extra effort to keep backup consistent Require extra effort to keep backup consistent
with normal data storagewith normal data storage– Backup should consider data-production rates Backup should consider data-production rates
and data-loss risk and be cost-effectiveand data-loss risk and be cost-effective
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1010
Backup Requirements Backup Requirements (cont.)(cont.)Decide Decide whatwhat and and how oftenhow often to backup depending to backup depending on risks:on risks:– Immediate losses of servicesImmediate losses of services:: power failure or power failure or
application crash that any data that has not been saved application crash that any data that has not been saved will be lost. If the data is critical, users must be aware will be lost. If the data is critical, users must be aware of this risk and make periodic “saves” themselvesof this risk and make periodic “saves” themselves
– Media lossesMedia losses:: storage media has physical damage and storage media has physical damage and can no longer be read. Need to decidecan no longer be read. Need to decide
How often to do a complete backup?How often to do a complete backup?
Will incremental backups be done between two complete Will incremental backups be done between two complete backups?backups?
What media will be used for backup?What media will be used for backup?
– Archiving inactive dataArchiving inactive data:: recent active data should be recent active data should be put onto a hard disk for fast access, while old inactive put onto a hard disk for fast access, while old inactive data can be archived to tape, CD or DVD in order to data can be archived to tape, CD or DVD in order to free hard disk spacefree hard disk space
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1111
Plan for Recovery Plan for Recovery ActionsActionsHigh-level management must decide what the High-level management must decide what the
organization should do after a disruptive event organization should do after a disruptive event happens. Possible choices:happens. Possible choices:– Do nothingDo nothing: loss is tolerable; rarely happens and : loss is tolerable; rarely happens and
cost more to correct it.cost more to correct it.– Seek for insurance compensationSeek for insurance compensation: provides : provides
financial support in the event of loss, but does not financial support in the event of loss, but does not provide protection for the organization’s provide protection for the organization’s reputation.reputation.
– Loss mitigationLoss mitigation: isolate the damage and try to bring : isolate the damage and try to bring the system back online as soon as possible.the system back online as soon as possible.
– Bring off-site system online for continuous Bring off-site system online for continuous operationoperation: maintain an off-site backup system that : maintain an off-site backup system that will kick in when the a disruptive event has made will kick in when the a disruptive event has made the original system unavailable.the original system unavailable.
Identify all possible choices, including Identify all possible choices, including cost/benefit analysis and present cost/benefit analysis and present recommendations to high-level management recommendations to high-level management for approval.for approval.
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1212
Off-site ProcessingOff-site ProcessingChoices for off-site processing:Choices for off-site processing:– Cold siteCold site: an empty facility located offsite with : an empty facility located offsite with
necessary infrastructure ready for installation of necessary infrastructure ready for installation of back-up system in the event of a disasterback-up system in the event of a disaster
– Mutual backupMutual backup:: two organizations with similar two organizations with similar system configuration agree to serve as a backup system configuration agree to serve as a backup site for each othersite for each other
– Hot siteHot site:: a site with hardware, software and a site with hardware, software and network installed and compatible to production sitenetwork installed and compatible to production site
– Remote journalingRemote journaling:: online transmission of online transmission of transaction data to backup system periodically to transaction data to backup system periodically to minimize loss of data and reduce recovery timeminimize loss of data and reduce recovery time
– Mirrored siteMirrored site: a site equips with a system identical : a site equips with a system identical to the production system with mirroring facility. to the production system with mirroring facility. Data is mirrored to backup system immediately. Data is mirrored to backup system immediately. Recovery is transparent to usersRecovery is transparent to users
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1313
Off-site Processing Off-site Processing (cont.)(cont.)
Mirrored Site
Remote Journaling
Hot SiteMutual Backup
Cold Site
Restore time
Cost
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1414
Decision Factors Decision Factors for Off-site Processingfor Off-site Processing
Availability of facilityAvailability of facility
Ability to maintain redundant Ability to maintain redundant equipmentequipment
Ability to maintain redundant Ability to maintain redundant network capacitynetwork capacity
Relationships with vendors to Relationships with vendors to provide immediate replacement or provide immediate replacement or assistanceassistance
Adequacy of fundingAdequacy of funding
Availability of skilled personnelAvailability of skilled personnel
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1515
Guidelines for Determining Guidelines for Determining Critical and Essential Critical and Essential
WorkloadWorkloadUnderstand system’s mission goalUnderstand system’s mission goal
Identify mission critical processesIdentify mission critical processes
Identify dependencies among different Identify dependencies among different departments/personnel within the departments/personnel within the organizationorganization
Understand influence of external factorsUnderstand influence of external factors– Government agenciesGovernment agencies– CompetitorsCompetitors– RegulatorsRegulators
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1616
Individual Responsibilities Individual Responsibilities in Emergency Responsein Emergency Response
Emergency response planning coordinator: Emergency response planning coordinator: coordinating the following activities:coordinating the following activities:– Establish contingency/disaster recovery plansEstablish contingency/disaster recovery plans– Maintain/modify the plansMaintain/modify the plans– Audit the plansAudit the plans
High-level manager (department manager, VP, etc.)High-level manager (department manager, VP, etc.)– Understand process and mission goal of Understand process and mission goal of
organizationorganization– Monitor contingency/disaster recovery plans and Monitor contingency/disaster recovery plans and
keep plans updatedkeep plans updatedAll other employeesAll other employees– Know contingency/disaster recovery plansKnow contingency/disaster recovery plans– Understand own responsibilities and expectations Understand own responsibilities and expectations
during operationduring operation– Know whom to contact if something not covered Know whom to contact if something not covered
in plan happensin plan happens
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1717
Emergency Destructive Emergency Destructive ProceduresProcedures
Under certain situations, an emergency response Under certain situations, an emergency response may focus on destroying data rather than restoring may focus on destroying data rather than restoring datadata– Physical protection of system is no longer Physical protection of system is no longer
available available – Critical assets (product design documents, list of Critical assets (product design documents, list of
sensitive customer or supplier, etc.) sensitive customer or supplier, etc.)
An emergency destructive plan should containAn emergency destructive plan should contain– Prioritized items that may need to be destroyedPrioritized items that may need to be destroyed– Backup procedure about critical data at a secure Backup procedure about critical data at a secure
off-site locationoff-site location– Specify who has authority to invoke destructive Specify who has authority to invoke destructive
planplan
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1818
Test a Contingency/Disaster Test a Contingency/Disaster Recovery PlanRecovery Plan
Testing is a necessary and essential Testing is a necessary and essential step in planning process:step in planning process:– A plan may look great on paper, but A plan may look great on paper, but
until it is carried out, no one knows until it is carried out, no one knows how it will performhow it will perform
– Testing not only shows the plan is Testing not only shows the plan is viable, but also prepares personnel viable, but also prepares personnel involved by practicing their involved by practicing their responsibilities and removing possible responsibilities and removing possible uncertaintyuncertainty
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 1919
Test a Contingency/Disaster Test a Contingency/Disaster Recovery Plan Recovery Plan (cont.)(cont.)
Five methods of testing such a planFive methods of testing such a plan– Walk-throughWalk-through: members of key units : members of key units
meet to trace their steps through the meet to trace their steps through the plan, looking for omissions and plan, looking for omissions and inaccuraciesinaccuracies
– SimulationSimulation: during a practice session, : during a practice session, critical personnel meet to perform critical personnel meet to perform “dry run” of the emergency, “dry run” of the emergency, mimicking the response to true mimicking the response to true emergency as closely as possibleemergency as closely as possible
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 2020
Test a Contingency/Disaster Test a Contingency/Disaster Recovery Plan Recovery Plan (cont.)(cont.)
– ChecklistChecklist:: more passive type of testing, more passive type of testing, members of the key units “check off” the members of the key units “check off” the tasks on list for which they are tasks on list for which they are responsible. Report accuracy of the listresponsible. Report accuracy of the list
– Parallel testingParallel testing:: backup processing backup processing occurs in parallel with production services occurs in parallel with production services that never stop. If testing fails, normal that never stop. If testing fails, normal production will not be affected.production will not be affected.
– Full interruptionFull interruption:: production systems production systems are stopped as if a disaster had occurred are stopped as if a disaster had occurred to see how backup services performto see how backup services perform
Stephen S. YauStephen S. Yau CSE465-591, Fall 2006CSE465-591, Fall 2006 2121
ReferencesReferences
M. Merkow, J. Breithaupt, M. Merkow, J. Breithaupt, Information Security: Principles and Information Security: Principles and PracticesPractices,, Prentice Hall, August Prentice Hall, August 2005, 448 pages, ISBN 0131547291 2005, 448 pages, ISBN 0131547291
J. G. Boyce, D. W. Jennings, J. G. Boyce, D. W. Jennings, Information AssuranceInformation Assurance:: Managing Managing Organizational IT Security RisksOrganizational IT Security Risks. . Butterworth Heineman, 2002, ISBN Butterworth Heineman, 2002, ISBN 0-7506-7327-30-7506-7327-3