53
Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 [email protected] Disaster Recovery Disaster Recovery Learning by Doing Theory Practice IST 515

Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Embed Size (px)

DESCRIPTION

Disaster Recovery. Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology The Pennsylvania State University University Park, PA 16802 [email protected]. Theory  Practice. Learning by Doing. IST 515. Objectives. Describe the basic differences between BCP and DRP - PowerPoint PPT Presentation

Citation preview

Page 1: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Chao-Hsien Chu, Ph.D.College of Information Sciences and Technology

The Pennsylvania State UniversityUniversity Park, PA 16802

[email protected]

Disaster RecoveryDisaster Recovery

LearningbyDoing

Theo

ry

Practi

ce

IST 515

Page 2: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology
Page 3: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

ObjectivesObjectives

• Describe the basic differences between BCP and DRP• Describe the steps involved in creating a disaster

recovery plan tests.• Identify and describe the various types of recovery

strategies.• Describe how to formulate a recovery strategy.• Compare and contrast strategies for backup.• Identify the advantages and disadvantages of mutual aid

agreements.• Compare and contrast the advantages and disadvantages

of hot sites and cold sites.• Compare and contrast the advantages and disadvantages

of using service bureaus.

Page 4: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

ReadingsReadings

• Hansche, S., Berti, J. and Hare, C., Official (ISC)2 Guide to the CISSP Exam, Auerbach, 2004. Chapter 9 (Required).

• Swanson, M., Wohl, A., Pope, L., Grance, T., Hash, J., and Thomas, R., Contingency Planning Guide for Information Technology Systems, NIST Special Publication 800-34, June 2002.

• Wikipedia, Disaster recovery. http://en.wikipedia.org/wiki/Disaster_recovery

Page 5: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

DisastersDisasters

Page 6: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

BCP CycleBCP Cycle

Page 7: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Areas Covered in BCPAreas Covered in BCP

• Contact points. Who to contact during office hours, outside office hours, and in an emergency;

• Roles and responsibilities. A well-defined organizational structure for the business continuity and recovery teams;

• Risk levels. A categorization of business risks and the level of risk the organization deems acceptable;

• Continuity and recovery service levels. How much time is acceptable for responding to threats, implementing continuity plans, and recovering from failure scenarios;

Page 8: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Areas Covered in BCPAreas Covered in BCP

• Business continuity reviews. How and when the organization reviews business continuity plans;

• Business continuity processes. Processes and procedures that inform staff how to react to and handle particular failure scenarios;

• Incident reporting and documentation. Methods of recording and documenting incidents and responses to them;

• Testing. Acceptance criteria and testing requirements for the business continuity plan; and

• Training. Training requirements for staff involved in business continuity and disaster recovery processes.

Page 9: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 1: Initiate the BCP ProjectStep 1: Initiate the BCP Project

1. Obtain and confirm support from senior management.

2. Identify key business and technical stakeholders.

3. Form a business continuity working group.

4. Define objectives and constraints.

5. Establish strategic milestones and draw up a road map.

6. Begin a draft version of business continuity policy.

Page 10: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 2: Identify Business ThreatsStep 2: Identify Business Threats

• Technology threats include natural disaster (such as flooding), fire, power failure, systems and network failure, systems and network flooding (when attackers try to overwhelm a network with traffic), virus attack, denial-of-service attack, theft, vandalism, and sabotage.

• Information threats come from hacking, theft, fraud, fabrication, alteration, misuse, natural disaster, fire, and the degradation of the ink on paper records.

• People threats include illness, recruitment shortfalls, resignation, compassionate leave, pregnancy, weather, and unavailability of transportation or office access.

Page 11: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 2: Identify Business ThreatsStep 2: Identify Business Threats

1. Identify the community of business and technical stakeholders.

2. Conduct threat identification workshops.

3. Delineate and document business threats.

Page 12: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 3: Conduct a Risk AnalysisStep 3: Conduct a Risk Analysis

• Conduct risk analysis workshops.

• Assess the likelihood and impact of threat occurrence.

• Categorize and prioritize threats according to risk level.

• Review outputs of risk analysis with management.

• Ascertain level of risk acceptable to the organization.

• Document outputs in business continuity policy.

Page 13: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 4: Establish the Business Continuity TeamStep 4: Establish the Business Continuity Team

• Identify key business, technical, and customer services stakeholders.

• Form and empower the business continuity team.

• Clarify and agree on team objectives and working mode.

• Define roles and responsibilities; produce a work plan.

• Identify incident engagement and response processes.

• Update business continuity policy.

Page 14: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Roles of BC TeamRoles of BC Team

• A business continuity manager is the first point of contact, manages the incident, initiates the business continuity plan, mobilizes the business continuity team, and presents key decisions to business owners when appropriate.

• The business owner makes key decisions about how the business handles incidents.

• The technical services manager manages disruptions to technical services, such as IT infrastructure and applications; initiates continuity arrangements; and interacts with third-party business continuity service providers.

• An estate manager manages disruptions relating to buildings, offices, and the surrounding environment; initiates continuity arrangements and interacts with third party business continuity service providers.

Page 15: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Roles of BC TeamRoles of BC Team

• The business operations and customer services manager manages disruptions to business operations and customer services; keeps customers informed if there is a noticeable impact on customer service levels; initiates continuity arrangements; and interacts with third-party business continuity service providers.

• Business continuity (or resumption) teams are technical, estate, or customer services teams that execute the business continuity plans.

• A recovery manager guides the business’ recovery to normal operations.

Page 16: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 5: Design the Business Continuity PlanStep 5: Design the Business Continuity Plan

• Identify critical and noncritical business services.

• Establish preferred business continuity service levels and profiles for continuity and recovery.

• Reaffirm key constraints (such as time and cost).

• For each threat, identify possible continuity strategies and evaluate them in terms of time, cost, and benefits.

• Identify and engage potential business continuity partners.

• Draft a set of continuity plans and work toward an agreed set of plans with senior management.

• Produce and execute an implementation plan.

Page 17: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Common StrategiesCommon Strategies

• Technology: Redundancy (of hardware and network, for example), maintenance and support agreements, and backup and restore capabilities are common defensive strategies.

• Information: Recover information by using data mirroring, backup and restore, auditing, and off-site or secondary data storage.

• People: To temporarily shore up people-related resources, use contract staff, rotas (workloads that a company can change in response to business demand or personnel shortfalls), call-out arrangements (having certain staff in standby mode to be called to work as necessary), rental offices and sites, manual procedures, and service-forwarding agreements (such as with specialist call centers).

Page 18: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Evaluating CriteriaEvaluating Criteria

• Costs for acquisition, deployment, testing, training, and associated management overhead;

• Level of protection;

• Business resumption response time; and

• Time to implement, including time for acquiring, deploying, and testing the business continuity strategy and for conducting relevant and necessary training.

Page 19: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 6: Define Your Business Continuity Step 6: Define Your Business Continuity ProcessesProcesses

• Identify, define, and document business continuity processes.

• Review and verify business continuity processes with relevant stakeholders.

• Identify training requirements.

• Develop training exercises, role-playing scripts, and simulation case studies.

• Initiate training and awareness programs.

Page 20: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Business Continuity ProcessesBusiness Continuity Processes

• Handling specific failure events, such as fire and network failures;

• Backup and restoration of systems and business data;

• Virus management;

• Incident reporting;

• Problem escalation hierarchies;

• Customer and staff communication;

• contact procedures for third-party support providers.

Page 21: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 7: Test your business continuity planStep 7: Test your business continuity plan

• Define business continuity acceptance criteria.

• Formulate the business continuity test plan.

• Identify major testing milestones.

• Devise the testing schedule.

• Execute tests via simulation and rehearsal; document test results.

• Assess overall effectiveness of business continuity plan; pinpoint areas of weakness and improvement.

• Iterate tests until the plan meets acceptance criteria.

• Check, complete, and distribute business continuity policy.

Page 22: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Reasons for Testing BCPReasons for Testing BCP

• Validate the plan’s effectiveness in meeting your stated business continuity service levels;

• Identify, at an early stage, any shortcomings in the plan;

• Assess whether your business continuity service levels are realistic and achievable given your budgetary and time constraints; and

• Give senior management and other parties (such as regulatory bodies) confidence in the plan.

Page 23: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Step 8: Review your business continuity planStep 8: Review your business continuity plan

• Develop a review schedule for different types of review.

• Arrange a business continuity review meeting or workshop.

• Update the business continuity document.

• Kick off another BCP cycle if necessary.

Page 24: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

When to Review BCP When to Review BCP

• Significant changes to the business—for example, the launch of new e-business operations;

• Changes in business priorities;

• Shifts in the legal or regulatory landscape;

• Significant world events (wars or terrorist attacks);

• Changes to the IT budget;

• Physical relocation of IT systems and operations;

• Outsourcing of IT systems and operations;

• Developments in IT infrastructure; and

• Significant changes in the labor market.

Page 25: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Common Pitfalls In BCPCommon Pitfalls In BCP

Page 26: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Disaster RecoveryDisaster Recovery

• Disaster recovery refers to the immediate and temporary restoration of critical computing and network operations after a natural or man-made disaster within defined timeframes.

• An organization should document how it will respond to a disaster and resume the critical business functions within a predetermined period of time; minimize the amount of loss; and repair (or replace) the primary facility to resume data processing support.

Page 27: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Disaster Recovery PlanningDisaster Recovery Planning

• A comprehensive statement of consistent actions to be taken before, during, and after a disruptive event that causes a significant loss of information systems resources

• The procedures for responding to an emergency, providing extended backup operations during the interruption, and managing recovery and salvage processes afterwards, should an organization experience a substantial loss of processing capability.

Page 28: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Disaster Recovery PlanningDisaster Recovery Planning

• To provide the capability to implement critical processes at an alternative site and return to the primary site and normal processing within a time frame that minimizes the loss to the organization, by executing rapid recovery procedures.

Page 29: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology
Page 30: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Goals and Objectives of DRPGoals and Objectives of DRP

• Protecting an organization from major computer services failure.

• Minimizing the risk to the organization from delays in providing services.

• Guaranteeing the reliability of standby systems through testing and simulation.

• Minimizing the decision-making required by personnel during a disaster.

Page 31: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology
Page 32: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Disaster Recovery ProceduresDisaster Recovery Procedures

• The recovery team.• The salvage team.• Normal operations resume.• Other recovery issues:

– Interfacing with external groups– Employee relations– Fraud and crime (vandalism and looting)

• Financial disbursement.• Media relations.

Page 33: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Recovery StrategiesRecovery Strategies

Page 34: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Recovery StrategiesRecovery Strategies

• Recovery strategies consist of a set of predefined and management approved actions implemented in response to an unacceptable business interruption.

• The focus is on recovery methods to meet the predetermined recovery timeframes established for the operation and functioning of the critical business functions.

• Developing the recovery strategies includes compiling the resource requirements and identifying the alternatives available during recovery.

Page 35: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Sample of Business Unit PrioritiesSample of Business Unit Priorities

Business UnitsRecover

Windows (Hrs)IT

PlatformsPriority

IT Security 2 Mainframe, LAN, WAN

1

Facilities 2 LAN, WAN 1

Legal 36 LAN, WAN 3

Administrative 18 LAN, WAN 2

Accounting 48 LAN, WAN 3

Human Resources 48 LAN, WAN 3

Page 36: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Steps in Developing Recovery StrategiesSteps in Developing Recovery Strategies

• Document all costs with each alternative.• Obtain costs for any outside services. • Develop written agreements.• Evaluate risk reduction and resumption

strategies based on a full loss of the facility.• Identify risk reduction measures and revise

resumption priorities and timeframes.• Document recovery strategies and present them

to management for comments and approval.

Page 37: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Recovery StrategiesRecovery Strategies

Strategies should address recovery of:

•Business operations•Facilities & supplies•Users (workers and end-users)•Technical (network, telecommunication, data center)•Data (off-site backups of data and applications)

Page 38: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Business Recovery StrategiesBusiness Recovery Strategies

• Business recovery strategies focus on critical resources and the MTD for each business function.

• The business unit priorities are taken directly from the BIA. The length of the recovery window for each business unit dictates the priority for recovery.

The strategies involved identifying the following:

• Critical business units and their associated business functions.

• Critical IT system requirements for each business function.

• Procedures for connectivity to IT infrastructures (e.g., mainframe, mini, LAN, WAN).

Page 39: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Business Recovery StrategiesBusiness Recovery Strategies

The strategies involved identifying the following:

• Critical equipment and supply requirements for each business function.

• Essential office space requirements of each business unit.

• Key personnel for each business unit.

• Redirection of postal service mail, voice telecommunications, and data networks to the recovery site.

• Business unit interdependencies with other units.

• Off-site storage (procedures, media, documents).

• Vendor services.

Page 40: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Facility and Supply Recovery StrategiesFacility and Supply Recovery Strategies

• Facility recovery involves identifying recovery procedures for the alternate facility, including space, security, fire protection, infrastructure, utility, supply, and environmental requirements.

• Determine minimum space for recovery of critical business units.

• Determine space needs for less critical resources.• Determine security needs at recovery sites. • Determine fire protection needs. • Determine critical furnishings and office equipment.• Determine infrastructure requirements.• Determine utility and environmental needs. • Determine what office/business supplies are needed.

Page 41: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

User Recovery StrategiesUser Recovery Strategies

• The strategies involved with personnel requirements focus on manual procedures, vital records, and restoration procedures. A critical component is establishing methods to implement the process and maintain the records so that information can be easily and accurately updated to the electronic format when service is restored.

The plan should specify the followings:• Manual procedures.• Vital record storage (i.e., medical, personnel).• Employee notification procedures.• Employee transportation arrangements.• Employee accommodations.

Page 42: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Technical Recovery StrategiesTechnical Recovery Strategies

• Technical recovery strategies define alternate recovery strategies for the data center and network infrastructure components.

Methods:• Subscription services.• Mutual aid agreements.• Redundant data centers.• Service bureaus.

Page 43: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Subscription ServicesSubscription Services

• Subscription services provide an alternate facility or “site” for recovery. They are characterized as hot, warm, cold, mirror,

• and mobile sites.• Hot Site. A fully configured site with complete

customer required hardware and software provided by the service.

• Warm Site. Similar to a hot site, but the expensive equipment (i.e., mainframe) is not available on-site. The site is ready in hours after the needed equipment arrives.

Page 44: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Subscription ServicesSubscription Services

• Cold Site. Does not include any technical equipment or resources, except environmental support such as air conditioning, power, telecommunication links, raised floors, etc.

• Mirror Site. Also referred to as full redundancy, is a computer service facility equipped with utilities, communication lines, and appropriate hardware that is fully operational and processes each transaction along with the primary site.

• Mobile Site. A trailer that can be set up and link by a trailer sleeve to create a space to suit the subscriber’s recovery needs.

Page 45: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Reciprocal or Mutual Aid AgreementsReciprocal or Mutual Aid Agreements

• This strategy is to establish reciprocal or mutual aid agreements with other companies to provide facilities to the other in the event of a disaster.

• Reciprocal agreements require the companies to have similar hardware and software computing environments.

• Typically, reciprocal agreements are dismissed in practice because few information system facilities have the extra capacity needed to run both their own and another organization’s needs for any extended period of time.

Page 46: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Technical Recovery StrategiesTechnical Recovery Strategies

Redundant Processing Centers:•Expensive•Maybe not enough spare capacity for critical operations

Service Bureaus:•Many clients share facilities•Almost as expensive as a hot site•Must negotiate agreements with other clients

Page 47: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Data Recovery StrategiesData Recovery Strategies

• The objectives are to back up critical software and data, store the backups at an off-site location, and retrieve the backups quickly during a recovery operation

• Backups of data and applications• Off-site vs. on-site storage of media• How fast can data be recovered?• How much data can you lose?• Security of off-site backup media• Types of backups (full, incremental, differential,

etc.)

Page 48: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Recovery ManagementRecovery Management

• This is sometimes referred to as Crisis Management. Essentially, it is the overall coordination of the organization’s response to a crisis.

• The goal is to deal with the issues in an effective and timely manner and avoid or minimize damage to the organization’s profitability, reputation, and ability to operate.

• The flow of accurate information is a key ingredient to effective crisis management. The effective management of information can serve as the first line of defense against a crisis and can also be the most effective mechanism in the process of restoring both the business functions and public confidence.

Page 49: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Testing the Disaster Recovery PlanTesting the Disaster Recovery Plan

• To verify the accuracy of the recovery procedures and identities

• To prepare and trains the personnel to execute their emergency duties

• To verify the processing capability of the alternative backup site

Page 50: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Testing DRPTesting DRP

Creating the Test Document:• Testing Schedule and Timing• The Duration of the Test• The Specific test steps• Who will be the participants in the test• The task assignments of the test personnel• The resources and services required (supplies,

hardware, software, documentation, and so forth)

Page 51: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology

Five DRP Test TypesFive DRP Test Types

• Checklist• Structured walk-through• Simulation• Parallel• Full-interruption

Page 52: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology
Page 53: Chao-Hsien Chu, Ph.D. College of Information Sciences and Technology