58
Business continuity and disaster Recovery

Business Continuity and Disaster

Embed Size (px)

DESCRIPTION

business continuity and disaster

Citation preview

Page 1: Business Continuity and Disaster

Business continuity and disaster

Recovery

Page 2: Business Continuity and Disaster

• Data storage and management aims at performance, availability and recovery improvement.

• There are four key critical service areas that each application requires:

• Primary Storage with focus on key performance• Operational Recovery with focus on recovery time and point

objectives from infrastructure failures and data corruption• Disaster Recovery with focus on recovery time and point objectives

from complete data center loss or failures• Archival Management with focus on compliance, access and recall

requirements• There are numerous situations, both planned and unplanned, that

will affect the availability of your system.

Page 3: Business Continuity and Disaster

Planned Downtime Unplanned Downtime

Scheduled, routine servermaintenance2. Hardware and/or software upgrades3. Physical server re-location

1. System crash caused by hardware and/or software failure2. Database corruption3. System compromised by virus(es), malicious attacks, or careless misuse

Page 4: Business Continuity and Disaster

• But with a little planning, you can maximize system uptime.

• This is the essence of Disaster Recovery Planning and Business Continuity Planning.

Page 5: Business Continuity and Disaster

Business Continuity and Disaster Recovery• A disaster is defined as a sudden, unplanned

catastrophic event that renders the organizations ability to perform mission-critical and critical processes, including the ability to do normal production processing of systems that support critical business processes.

• A disaster could be the result of significant damage to a portion of the operations, a total loss of a facility, or the inability of the employees to access that facility.

Page 6: Business Continuity and Disaster

Enterprise Operations Cycle of Disaster Recovery

Page 7: Business Continuity and Disaster

Business Continuity

• Business Continuance ensures availability of the functions that are critical to the day-to-day business environment in the event of a natural disaster, hardware failure, or application failure.

• Disaster Recovery is a reactive plan to resume IT operations after a failure.

• Business continuance is a proactive plan to maintain business operations despite planned or unplanned outages.

• Planning for business continuance requires careful analysis of network requirements and typically requires developing some or all of the following documentation:

Page 8: Business Continuity and Disaster

• Business continuance plan—defines the resources, actions, tasks, and data required to manage the business recovery process in the event of a business interruption.

• Data (or application) classification study—determines the value of each application to the business and assigns a service level to each, such as mission critical, business critical and critical.

• Applications that supply data to critical applications must also be identified.• Business impact analysis (or assessment)—defines the effect on the

organization when an application becomes unavailable.

• A business impact assessment identifies the risks and potentialcosts associated with application downtime, data loss, and disrupted user access.

Page 9: Business Continuity and Disaster

Disaster Recovery vs. Business Continuity• Terms are often used interchangeably. • Different but complementary components of a

business's overall recovery and continuity planning. Disaster Recovery Planning (DRP) is concerned with the recovery of systems and infrastructure components, Business Continuity Planning has a larger scope – namely, the determination of which business components and functions need to be recovered – and those which can be ignored.

• Briefly, Disaster Recovery is the coordinated process of restoring systems, data, and infrastructure required to support key on-going business operations.

Page 10: Business Continuity and Disaster

• Disaster Recovery is the coordinated process of restoring systems, data, and infrastructure required to support key on-going business operations.

• DISASTER RECOVERY PLANNING (DRP): The technological aspect of business continuity planning. The advance planning and preparations that are necessary to minimize loss and ensure continuity of the critical business functions of an organization in the event of disaster.

• BUSINESS CONTINUITY PLANNING (BCP): Process of developing advance arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions continue with planned levels of interruption or essential change. (Source: www.drii.org)

Page 11: Business Continuity and Disaster

DRP

• A DR plan is usually limited in scope to a set of defined IT systems and infrastructure, with the ultimate goal being the complete recovery of those systems and infrastructure within a defined timeframe and with minimal data loss. The DR plan may exclude non-IT business units – such as Accounting, Marketing, Sales, etc., except in terms of recovery of the defined applications used by these business units.

Page 12: Business Continuity and Disaster

• Contrast this with a BC plan that has as its scope the entire enterprise, with the ultimate goal being recovery of mission-critical/core business functions to ensure the survival of the enterprise.

• The business functions to be recovered in a BC extend beyond IT systems – for example, BC concerns itself not only with the recovery of applications that support sales, but also with the recovery of the related infrastructure (such as office space), supplies (such as marketing materials and manual forms), etc.

Page 13: Business Continuity and Disaster

Disaster Recovery Planning

• Concerned with the recovery of a key set of IT Systems and infrastructure components.

• The process of Business Continuity Planning is concerned with the enterprise as a whole, dealing with business functions rather than application systems. Disaster Recovery (DR) can be thought of as the foundation of Business Continuity (BC); a working DRP is a core component of any Business Continuity effort.

Page 14: Business Continuity and Disaster

• Since both DR and BC address the same basic problem – recovery from an unforeseen event and continuation of the business, there are a number of areas with overlap between the two processes.

• Many companies charter their DR effort as part of their Business Continuity effort.

• However, just as many will first build a working DRP before taking the net logical step and constructing an overall BCP.

• With an understanding of the relationship between the two processes, knowledge gained in one area can be leveraged

• in the other.

Page 15: Business Continuity and Disaster

Defining Recovery

• Define what is meant by “Recovery.”• From the IT perspective, recovery will usually mean

establishing support for the processing and communications functions considered critical by the business community, and then establishing support for ancillary systems.

• From the business perspective, recovery will mean being able to execute the business functions that are at the core of the business, and then being able to execute ancillary functions.

Page 16: Business Continuity and Disaster

• Another key factor is the timeframe. Is the goal to have all systems up and running within a day? A week?

• Is it enough to only bring up a few key systems within the first week, while taking longer to restore others? This factor is often expressed as either the “Recovery Time Objective”

• (RTO) or the “Service Delivery Objective” (SDO). This refers to the amount of time that can elapse from the failure to the time when the systems or services are available for use. Most often this factor can vary by system or service; for example, a company’s order processing system may have a SDO of 24 hours, while the company’s intranet has an SDO of 1 week.

Page 17: Business Continuity and Disaster

RTO and SDO• What is the difference between RTO and SDO? It is possible to

recover one or more “systems” in a short• period of time, but due to an unplanned dependency (often

unknown until testing the DRP) or recovery• failure it is impossible to provide the full and necessary functionality

to restore service. Hence, while having• a good RTO is important, understanding SDO and planning around

that goal is even more important. • A single failure of a key component that other systems are

dependent on can result in failures of many other• systems. Because of this it is important to understand the

dependencies within your environment.

Page 18: Business Continuity and Disaster

RPO

• How much data can you afford to lose? This is expressed as the “Recovery Point Objective” or RPO.

• Depending on the environment, the loss of any data could have a significant impact. What's better a lower or a higher RPO?

Page 19: Business Continuity and Disaster

• It is also important to have some way of tracking the various components of a recovery effort such as using a “Critical Success Factors” spread sheet that breaks a recovery effort down by key area, and is weighted by system criticality, downstream dependencies and impact of failure.

• As systems are recovered, the spread sheet is updated.

Page 20: Business Continuity and Disaster

Benefits:

• In a DR test scenario, this spread sheet provides the overall “score” based on the recovery test.

• This score can then be used as a tangible way to track progress and identify key deficiencies in the overallDisaster Recovery Plan, both for the current test and as part of a baseline metric to track improvement over time.

• In an actual recovery effort, the spread sheet provides a “checklist” that can be reviewed with management to see the overall progress of the DR effort.

• It also clearly draws out the ramifications of a failed system through the weighted scoring and dependency lists, as well as to help identify timings with regards to RTO/SDO and RPO.

Page 21: Business Continuity and Disaster

Disaster Recovery Planning

• A Disaster Recovery Planning project cannot be completed in a week or even a month.

• In many ways, a DRP is never completed – the plan must be tested and updated at least once per year, if not more frequently. A Plan that does not keep pace with the changes in your organization is a disaster in itself, providing a false sense of security.

• The primary objectives of a Disaster Recovery Plan are to guide an organization in the event of a disaster and to effectively re-establish critical business operations within the shortest possible period of time with a minimal loss of data.

Page 22: Business Continuity and Disaster

• The goals of the planning project are to assess current and anticipated vulnerabilities, define the requirements of the business and IT communities, design and implement risk mitigation procedures, and provide the organization with a plan that will enable it to react quickly and efficiently at the time of a disaster.

Page 23: Business Continuity and Disaster

• A DRP first needs to be created, then tested and refined, and finally implemented and tested on a periodic basis.

• While this can be an expensive and time-consuming effort, it can prove to be a bargain in the long run

• if some type of disaster occurs. • Most plans will experience some type of failure during

their first execution so it is very important to test the plan.

• Systems and personnel change, so it is also important to retest the plan on a periodic basis, ideally rotating staff.

Page 24: Business Continuity and Disaster

DRP address 3 main functional areas

• 1. Recovery: Once the infrastructure is in place it will be necessary to recover production data. Since recovery may not be up to the point of failure, it is important to identify any processing that needs to be redone. Can all of the data feeds to the system be identified? How many of them can be redone with 100% certainty of success? It is important to minimize "holes" in data (especially in a distributed processing environment where one step could be dependent on one or more predecessor steps or actions), and then to identify the action to be taken when data inconsistencies are detected.

• There should be an audit trail for all work performed during this phase. Once the data is recovered there should be some type of validation process to ensure that the recovery was complete, leaving a consistent work environment.

Page 25: Business Continuity and Disaster

2. Restoring / Sustaining Business Operations:

• All processing requirements and service level agreements need to be defined and documented. Dependencies between processes also need to be defined. It is important to document the existing process and then build the plan accordingly.

• Anything that ran before (in production) will probably need to run again (at the hot site), so

scheduling and dependency information is critical. Remember that routine maintenance (including backups) should still be performed (it too is an asset that requires protection).

Page 26: Business Continuity and Disaster

3. Transferring Data back to Production Machines

• This is one area that is generally omitted from a DRP, but which is very important.

• Eventually production will need to back to normal. • A process needs to be defined to manage this migration.• Often the Client will elect to execute the DRP on the production

machines in order to synchronize the machines to a specific point in time.

• It is one of the more difficult tasks to test.

Page 27: Business Continuity and Disaster

The DRP should address 3 main technical areas

• 1. Hardware Issues: This includes machine type (especially an issue when using equipment that is older or not as popular), configuration (disk capacity, peripheral devices, device names, RAM, file systems and volume groups, OS users, etc.) and operating system version and patch level (hopefully it is a current version in case vendor support is required).

• Another issue is deciding whether to use an existing preconfigured machine or to completely configure a machine (load the OS, initialize and configure disks, TCP/IP configuration, SCSI addresses, everything). There are pros and cons to each scenario. It is recommended that organizations plan for the worst case (i.e., the complete rebuild).

Page 28: Business Continuity and Disaster

• Note: It may be possible to reconstruct the production machine on a new machine using a tape backup.

• This method does not leave much room for flexibility relative to hardware configuration, but is very fast when compared to a manual system reconstruction.

• The key to success is to ensure that DRP machines have at least as much capacity as the production machines that they are replacing, that they are compatible architectures, and that "someone" has the installation media for the OS.

• These machines usually either resides at a remote location (in which case they can be pre-configured) or are provided as-needed by a company that specializes in providing facilities and equipment.

Page 29: Business Continuity and Disaster

Networking Issues• The environment most likely consists heterogenous

machines.• Is any special type of LAN or VPN software required?• How do the machines communicate with one

another?• Are there requirements for connections to an

external network (WAN, Internet, Extranet)?• Is there any other type of Client/Server or n-tier

activity that will need to be supported?• All networking requirements and issues need to be

identified, documented, and then addressed in the DRP.

Page 30: Business Continuity and Disaster

Software Issues• Software includes the operating System, user written

applications, and third party software (RDBMS, report writers, GUI products, backup/recovery products, scheduling software, etc.).

• A comprehensive inventory of currently used software, including current version, license information, and support contact information is essential. This is what runs your business. Working hardware and an accessible network is worthless if your critical applications are not working!

Page 31: Business Continuity and Disaster

Procedures/Methods in planning disaster recovery.

• 1. Identification and Analysis of Disaster Risks/Threats

• The first step is to identify the threats or risks. Risk analysis (sometimes called business impact analysis) involves evaluating existing physical and environmental security and control systems, and assessing their adequacy with respect to the potential threats. The risk analysis process begins with a list of the essential functions of the business. This list will set priorities for addressing the risks.

Page 32: Business Continuity and Disaster

2. Classification of Risks Based on Relative Weights

Categorize risk into different classes to accurately prioritize them.

In general, risks can be classified in the following five categories.

• 3. Evaluation of Disaster Recovery Mechanisms• Determine the best suitable recovery method for each.

The main factors that need to be considered are:• · Cost of deployment, maintenance, and operation• · Recovery time• · Ease of recovery activation and operation

Page 33: Business Continuity and Disaster

• 3. Setting up a Disaster Recovery Committee• This committee should have representation

from all the different company agencies with a role in the disaster recovery process, typically management, finance, IT electrical department, security department, human resources, vendor management, and so on. The Disaster Recovery Committee creates the

disaster recovery plan and maintains it.

Page 34: Business Continuity and Disaster

5. Disaster Recovery Phases• Disaster recovery happens in the following sequential phases:• 1. Activation Phase: Quick and precise detection and

communication the key for reducing the effects of the incoming emergency; in some cases it may give enough time to allow system personnel to implement actions gracefully, thus reducing the impact of the disaster. In this phase, the disaster effects are assessed and announced.

• a. Notification procedures• b. Damage assessment• c. Disaster recovery activation planning• 2. Execution Phase: In this phase, the actual procedures to

recover each of the disaster affected entities are executed. Business operations are restored on the recovery system.

• 3. Reconstitution Phase: In this phase the original system is restored and execution phase procedures are stopped.

Page 35: Business Continuity and Disaster

The Disaster Recovery Plan Document

• The outcome of the disaster recovery planning process is the disaster recovery plan document. will be the source of information for disaster recovery procedures. The disaster recovery plan document is the only reliable source of information for the disaster recovery during an emergency. It should be very easily readable, with simple and detailed instructions. Should be kept up to date with the current organization environment. Periodic Mock Drills and periodic updates are recommended for the maintenance of the plan documentation.

Page 36: Business Continuity and Disaster

Disaster Recovery Planning Project Steps

• Step I – Project Initiation• The objectives of the disaster recovery planning

project initiation are to gain an understanding of the existing and planned future IT environment of the organization, define the scope of the project, develop the project schedule, and identify risks to the project. Input from the BCP team will be required as part of the requirements analysis phase.

• In addition, a Project Sponsor/Champion and Steering Committee should be established during this phase.

Page 37: Business Continuity and Disaster

Step II – Assessment of Disaster Risk

• This should include, but not be limited to, an assessment of geographical location, building composition, computing environment/physical plant security, installed security devices (including automated fire extinguishers and automated shut-down devices), computing environment/physical plant access control systems and software, personnel practices, operating practices, and backup practices. This is a good time to perform an IT Assessment, Practices and Procedures Audit, and Single Points of Failure Analysis.

Page 38: Business Continuity and Disaster

Step III – Business Impact Analysis

Key business units that are supported by the IT undertaken to identify which systems and functions critical to the continuation of business, and to determine the length of time that those units can survive without the critical systems in operation.

This analysis is essential to making decisions about how to implement disaster recovery.

Page 39: Business Continuity and Disaster

Step IV – Definition of Requirements

• All requirements of, and relating to, the Plan must be defined and detailed. These include, the recovery requirements of the business and IT communities, the requirements generated by the business impact analysis, and the requirements generated by the assessment of disaster risk and the mitigation of disaster risk.

Page 40: Business Continuity and Disaster

Step V – Project Planning

• It is important here to distinguish between the Project Plan and the Disaster Recovery Plan. The Project Plan in this case will define the project that is being executed and as one of its objectives will develop the Disaster Recovery Plan. An additional objective of this project is to mitigate as much disaster risk as possible.

Page 41: Business Continuity and Disaster

• VI – Project Execution• The project should proceed according to standard practices of Project

Management. During the project the identified methods of mitigating risk will be executed, and the Disaster Recovery Plan will be constructed

• and tested.• Step VII – BCP Integration• The DR Plan needs to integrate back into the organization’s overall

Business Continuity efforts. For an organization that has run the DR effort as part of an overall BC effort, this has likely already been done.

• However, for an organization that builds their DRP first and then creates a BCP from that foundation it is important to align the two.

Page 42: Business Continuity and Disaster

• Step VIII – On-going Maintenance and Integration

• Part of the plan will include the on-going maintenance and testing efforts required to keep the Plan up to date, as well as processes to identify and mitigate future risks as they are encountered.

Page 43: Business Continuity and Disaster

What is BCP?

• Disasters test the contingency plans of businesses to an unanticipated degree. Companies that have business

• continuity plans and contracts in place with vendors of recovery services are able to continue business at

• alternate sites with minimum downtime and minimum loss of data. Recovery itself must be speedy (under 24

• hours) for high-availability systems— and the facilities must provide continuity not only of the data centre

• but also of all critical aspects of its clients’ businesses.

Page 44: Business Continuity and Disaster

• At the most basic level, Business Continuity Planning (BCP) can be defined as an iterative process that is designed to identify mission critical business functions and enact policies, processes, plans and procedures to ensure the continuation of these functions in the event of an unforeseen event. All activity surrounding the creation, testing, deployment, and maintenance of a BCP can be viewed in terms of this definition.

Page 45: Business Continuity and Disaster

• The other main difference between a DR and BC concerns the definition of what to recover and what to exclude. Business Continuity requires the definition and determination of response to risk (Risk Analysis and Response), the definition of possible failure areas (a Single Points of Failure, or SPOF analysis), and the determination of the impact of these areas on the business as a whole (Business Impact Analysis, or BIA).

• This analysis will result in the determination of what business functions are "core" or "mission critical" – these are the business functions that are essential for the survival of the enterprise, and will by necessity be the focus of the BC effort.

Page 46: Business Continuity and Disaster

• Another key area within the scope of the BCP is concerned with data, system, and application dependencies. The failure to identify, plan for, and properly recover systems and processes in light of these dependencies could very well keep a business unit from operating properly. This scenario highlights the importance of these issues, and underscores why they need to be identified early in the BC process and addressed accordingly.

Page 47: Business Continuity and Disaster

• If an enterprise has a working DR plan or plans, it makes more sense to leverage that knowledge into the creation of the BC plan. In the second case, the integration of the DR plan into the new BC plan is of utmost importance. There is also the chance that portions of the existing DR plan or plans may need to be reworked or become obsolete based on the overall allocation of resources determined by the BC effort.

Page 48: Business Continuity and Disaster

• Unlike a DRP (which can be owned and managed at the department level), the scale, cost, and impact of a

• true BCP are at the enterprise level and needs to be managed as such. Because of it is logical connection to

• DRP, many companies BCP plans fall under the responsibility of the IT department – however, since

• Business Recovery comprises many areas that are outside of the scope of the IT department (such as Legal Counsel, Supply Chain, and external agency liaisons), this is often a flawed assumption.

Page 49: Business Continuity and Disaster

The Phases of Business Continuity Planning, Implementation, and Management

• The significance of each major phase of continuity planning merits attention because each phase contributes to building all four areas of business continuity: disaster recovery, business recovery, business resumption, and contingency planning:

Page 50: Business Continuity and Disaster

• Phase 1— Establish the foundation. These alignment and analysis steps are necessary to obtain executive sponsorship and the commitment of resources from all stakeholders. Without a basis of business impact analysis and risk assessment, the plan cannot succeed and may not even be developed.

• Phase 2— Develop and implement the plan. Here, attention to detail and active participation by all stakeholders ensure the development of a plan worth implementing. The plan itself must include the recovery strategy with all of its detailed components and the test plan.

• Phase 3— Maintain the plan. The best plan is only as effective as it is current. Every tactic of business resumption and recovery must be kept up to date and tested regularly.

Page 51: Business Continuity and Disaster

Types of Plans

• The separate plans that make up a business continuity plan include:

• • Disaster recovery plan— to recover mission-critical technology and applications at an alternate site.

• • Business resumption plan— to continue mission-critical functions at the production site through work-arounds until the application is restored.

• • Business recovery plan— recover mission-critical business processes at an alternate site (sometimes called “workspace recovery”).

• • Contingency plan— to manage an external event that has far-reaching impact on the business.

Page 52: Business Continuity and Disaster

Business Use of BCP

• The need for high availability of systems today approaches 24x365 across all industries, for both service and manufacturing organizations. Despite that fact, the relatively high percentage of companies without business continuity plans indicates that strategic planners may be relying on a combination of insurance and outsourced physical recovery sites to take the required steps independently, absent of any cohesion.

Page 53: Business Continuity and Disaster

• Organizations need to identify and prioritize their resources in terms of: Mission Critical to accomplishing the mission of the organization Can be performed only by computers No alternative manual processing capability exists Must be restored within 36 hours

Page 54: Business Continuity and Disaster

• Critical• o Critical in accomplishing the work of the organization• o Primarily performed by computers• o Can be performed manually for a limited time period• o Must be restored starting at 36 hours and within 5 days• · Essential• o Essential in completing the work of the organization• o Performed by computers• o Can be performed manually for an extended time period• o Can be restored as early as 5 days, however it can take

longer.• ·

Page 55: Business Continuity and Disaster

• Non-essential/Non-critical: Systems and resources that can be recovered within several days or weeks of the disaster, or until the crisis has passed.

• Non-Critical to accomplishing the mission of the organization can be delayed until damaged site is restored and/or a new computer system is purchased can be performed manually.

Page 56: Business Continuity and Disaster

What if the organization’s DR involves use of backup cloud services

• Backup cloud services have become a popular trend for consumers and even some smaller businesses. While

• cloud services do address some reliability and accessibility issues quite well, their inherent design limits their

• ability to provide the needed level of immediate accessibility.• 1. Potential High Reliability - Depending on the cloud service

and actual infrastructure, backup cloud services hold the promise of being able to provide reliable backup and restore. In terms of integrity, a cloud service should be able to provide the same level of integrity as a disk-to-disk solution, but

• without the need to go to tape for off-site archival.

Page 57: Business Continuity and Disaster

• 2. Automated Offsite Storage – One of the major benefits of a backup cloud service is its trans-parent, automated nature. Data is automatically backed up and archived at an offsite, hosted data centre without any user intervention.

• 3. Slow Time to Recovery• The main disadvantage of a cloud service for backup is the

slow time to recovery. While it might be fine for recovering a few files, transferring the backup files for entire servers over the Internet can take days to weeks, depending on an organization’s Inter-net bandwidth and connection speeds.

Page 58: Business Continuity and Disaster

Question

• What key items should a good SLA agreement used to evaluate a cloud service provider contain?