STF-4 Business Continuity.pdf

  • Upload
    ny94

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 STF-4 Business Continuity.pdf

    1/140

  • 8/14/2019 STF-4 Business Continuity.pdf

    2/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 2

    2006 EMC Corporation. All rights reserved. Business Continuity - 2

    Section Objectives

    Upon completion of this section, you will be able to:

    Describe what business continuity is

    Describe the basic technologies that are enablers of dataavailability

    Describe basic disaster recovery techniques

    The objectives for this section are shown here. Please take a moment to read them.

  • 8/14/2019 STF-4 Business Continuity.pdf

    3/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 3

    2006 EMC Corporation. All rights reserved. Business Continuity - 3

    In This Section

    This section contains the following modules:

    Business Continuity Overview

    Backup and Recovery

    Business Continuity Local Replication

    Business Continuity Remote Replication

    This section contains the following 4 modules:

    Business Continuity Overview

    Backup and Recovery

    Business Continuity Local Replication

    Business Continuity Remote Replication.

  • 8/14/2019 STF-4 Business Continuity.pdf

    4/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 4

    2006 EMC Corporation. All rights reserved. Business Continuity - 4

    Business Continuity Overview

    After completing this module, you will be able to:

    Define and differentiate between Business Continuity andDisaster Recovery

    Differentiate between Disaster Recovery and DisasterRestart

    Define terminology such as Recovery Point Objective andRecovery Time Objective

    Give a high level description of Business ContinuityPlanning

    Identify Single Points of Failure and describe solutions toeliminate them

    The are the objectives for this module. Please take a moment to review them.

  • 8/14/2019 STF-4 Business Continuity.pdf

    5/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 5

    2006 EMC Corporation. All rights reserved. Business Continuity - 5

    What is Business Continuity?

    Business Continuity is the preparation for, response to,

    and recovery from an application outage that adverselyaffects business operations

    Business Continuity Solutions address systemsunavailability, degraded application performance, orunacceptable recovery strategies

    Since information is a primary asset for most businesses, business continuity is a major concern.

    This is not just a concern for the Information Technology department, it impacts the entire

    business. At one time, data storage was viewed as a simple issue. The requirements have

    become more sophisticated. Businesses must now contend with information availability, storageand business continuation in adverse events large or small, man-made or natural. Before we

    can talk about business continuity and solutions for business continuity, we must first define the

    terms. Business Continuity is the preparation for, response to, and recovery from an application

    outage that adversely affects business operations. Business Continuity Solutions address

    systems unavailability, degraded application performance, or unacceptable recovery strategies.

  • 8/14/2019 STF-4 Business Continuity.pdf

    6/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 6

    2006 EMC Corporation. All rights reserved. Business Continuity - 6

    Lost RevenueKnow the downtime costs (per

    hour, day, two days...) Number of employeesimpacted (x hours out *hourly rate)

    Damaged Reputation

    Customers Suppliers Financial markets Banks

    Business partners

    Financial Performance

    Revenue recognition Cash flow Lost discounts (A/P) Payment guarantees

    Credit rating Stock price

    Other Expenses

    Temporary employees, equipment rental, overtimecosts, extra shipping costs, travel expenses...

    Why Business Continuity

    Direct loss Compensatory payments Lost future revenue Billing losses Investment losses

    Lost Productivity

    There are many factors that need to be considered when calculating the cost of downtime. A

    formula to calculate the costs of the outage should capture both the cost of lost productivity of

    employees and the cost of lost income from missed sales.

    The Estimated average cost of 1 hour of downtime = (Employee costs per hour) *( Number

    of employees affected by outage) + (Average Income per hour).

    Employee costs per hour is simply the total salaries and benefits of all employees per week,

    divided by the average number of working hours per week.

    Average income per hour is just the total income of an institution per week, divided by

    average number of hours per week that an institution is open for business.

  • 8/14/2019 STF-4 Business Continuity.pdf

    7/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 7

    2006 EMC Corporation. All rights reserved. Business Continuity - 7

    Information Availability

    20 min 10 sec17 hrs 31 min0.2%99.8%

    1 hr 41 min3.65 days1%99%

    10 min 5 sec8 hrs 45 min0.1%99.9%

    0.6 sec31.5 sec0.0001%99.9999%

    6 sec5.25 min0.001%99.999%

    1 min52.5 min0.01%99.99%

    3hrs 22 min7.3 days2%98%

    Downtime per WeekDowntime per Year% Downtime% Uptime

    Information Availability ensures that applications and business units have access to information

    whenever it is needed. The primary components of information availability are:

    Protection from data loss

    Ensuring data access

    Appropriate data security

    The online window for some critical applications has moved to 99.999% of time.

    Information availability depends upon robust, functional IT systems.

  • 8/14/2019 STF-4 Business Continuity.pdf

    8/140

  • 8/14/2019 STF-4 Business Continuity.pdf

    9/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 9

    2006 EMC Corporation. All rights reserved. Business Continuity - 9

    Tape

    B

    ackup

    Pe

    riodic

    Replication

    Recovery Point Objective (RPO)

    Wks Days Hrs Mins Secs

    Recovery Point Recovery TimeRecovery Point Recovery Time

    Tape

    B

    ackup

    Pe

    riodic

    Re

    plication

    Asynchronous

    Replication

    Asynchronous

    Replication

    Sy

    nchronous

    Replication

    Synchronous

    Replication

    Secs Mins Hrs Days Wks

    Recovery Point Objective (RPO) is the point in time to which systems and data must be

    recovered after an outage. This defines the amount of data loss a business can endure. Different

    business units within an organization may have varying RPOs.

  • 8/14/2019 STF-4 Business Continuity.pdf

    10/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 10

    2006 EMC Corporation. All rights reserved. Business Continuity - 10

    Recovery Time Objective (RTO)

    Recovery Time includes:

    Fault detection

    Recovering data

    Bringing apps back online

    Global

    Cluster

    Wks Days Hrs Mins Secs Secs Mins Hrs Days Wks

    Recovery Point Recovery TimeRecovery Point Recovery Time

    Global

    Cluster

    Manual

    Migration

    M

    anual

    M

    igration

    Ta

    peRestore

    Ta

    peRestore

    Recovery Time Objective (RTO) is the period of time within which systems, applications, or

    functions must be recovered after an outage. This defines the amount of downtime that a

    business can endure, and survive.

  • 8/14/2019 STF-4 Business Continuity.pdf

    11/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 11

    2006 EMC Corporation. All rights reserved. Business Continuity - 11

    Disaster Recovery versus Disaster Restart

    Most business critical applications have some level of data

    interdependencies

    Disaster recovery

    Restoring previous copy of data and applying logs to that copy to bring it toa known point of consistency

    Generally implies the use of backup technology

    Data copied to tape and then shipped off-site

    Requires manual intervention during the restore and recovery processes

    Disaster restart

    Process of restarting mirrored consistent copies of data and applicationsAllows restart of all participating DBMS to a common point of consistency

    utilizing automated application of recovery logs during DBMS initialization

    The restart time is comparable to the length of time required for theapplication to restart after a power failure

    Disaster recovery is the process of restoring a previous copy of the data and applying logs or

    other necessary processes to that copy to bring it to a known point of consistency.

    Disaster restart is the restarting of dependent write consistent copies of data and applications,utilizing the automated application of DBMS recovery logs during DBMS initialization to bring

    the data and application to a transactional point of consistency.

    There is a fundamental difference between Disaster Recovery and Disaster Restart. Disaster

    recovery is the process of restoring a previous copy of the data and applying logs to that copy to

    bring it to a known point of consistency. Disaster restart is the restarting of mirrored consistent

    copies of data and applications.

    Disaster recovery generally implies the use of backup technology in which data is copied to tape

    and then it is shipped off-site. When a disaster is declared, the remote site copies are restored

    and logs are applied to bring the data to a point of consistency. Once all recoveries are

    completed, the data is validated to ensure it is correct.

  • 8/14/2019 STF-4 Business Continuity.pdf

    12/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 12

    2006 EMC Corporation. All rights reserved. Business Continuity - 12

    Disruptors of Data Availability

    Disaster (

  • 8/14/2019 STF-4 Business Continuity.pdf

    13/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 13

    2006 EMC Corporation. All rights reserved. Business Continuity - 13

    Causes of Downtime

    Human Error

    System Failure

    Infrastructure Failure

    Disaster

    Today, the most critical component of an organization is information. Any disaster occurrence

    will affect information availability critical to run normal business operations.

    In our definition of disaster, the organizations primary systems, data, applications are damagedor destroyed. Not all unplanned disruptions constitute a disaster.

  • 8/14/2019 STF-4 Business Continuity.pdf

    14/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 14

    2006 EMC Corporation. All rights reserved. Business Continuity - 14

    Business Continui ty vs. Disaster Recovery

    Business Continuity has a broad focus on prevention:

    Predictive techniques to identify risks

    Procedures to maintain business functions

    Disaster Recovery focuses on the activities that occurafter an adverse event to return the entity to normalfunctioning.

    Business Continuity is a holistic approach to planning, preparing, and recovering from an

    adverse event. The focus is on prevention, identifying risks, and developing procedures to

    ensure the continuity of business function. Disaster recovery planning should be included as

    part of business continuity.

    BC objectives include:

    Facilitate uninterrupted business support despite the occurrence of problems.

    Create plans that identify risks and mitigate them wherever possible.

    Provide a road map to recover from any event.

    Disaster Recovery is more about specific cures, to restore service and damaged assets after an

    adverse event. In our context, Disaster Recovery is the coordinated process of restoring systems,

    data, and infrastructure required to support key ongoing business operations.

  • 8/14/2019 STF-4 Business Continuity.pdf

    15/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 15

    2006 EMC Corporation. All rights reserved. Business Continuity - 15

    Business Continuity Planning (BCP)

    Includes the following activities:

    Identifying the mission or critical business functions

    Collecting data on current business processes

    Assessing, prioritizing, mitigating, and managing risk

    Risk Analysis

    Business Impact Analysis (BIA)

    Designing and developing contingency plans and disaster

    recovery plan (DR Plan)

    Training, testing, and maintenance

    Business Continuity Planning (BCP) is a risk management discipline. It involves the entire

    business--not just IT. BCP proactively identifies vulnerabilities and risks, planning in advance

    how to prepare for and respond to a business disruption. A business with strong BC practices in

    place is better able to continue running the business through the disruption and to return tobusiness as usual.

    BCP actually reduces the risk and costs of an adverse event because the process often uncovers

    and mitigates potential problems.

  • 8/14/2019 STF-4 Business Continuity.pdf

    16/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 16

    2006 EMC Corporation. All rights reserved. Business Continuity - 16

    Objectives

    Train, Test, and

    Document

    Implement,

    Maintain, and

    Assess

    Analysis

    Design

    Develop

    Business Continuity Planning Lifecycle

    The Business Continuity Planning process includes the following stages:

    1. Objectives

    Determine business continuity requirements and objectives including scope and budget

    Team selection (include all areas of the business and subject matter expertise (internal/external) Create the project plan

    2. Perform analysis

    Collect information on data, business processes, infrastructure supports, dependencies, frequency of use

    Identify critical needs and assign recovery priorities.

    Create a risk analysis (areas of exposure) and mitigation strategies wherever possible.

    Create a Business Impact Analysis (BIA)

    Create a Cost/benefit analysis identify the cost (per hour/day, etc.) to the business when data is unavailable.

    Evaluate Options

    3. Design and Develop the BCP/Strategies

    Evaluate options

    Define roles/responsibilities Develop contingency scenarios

    Develop emergency response procedures

    Detail recovery, resumption, and restore procedures

    Design data protection strategies and develop infrastructure

    Implement risk management/mitigation procedures

    4. Train, test, and document

    5. Implement, maintain, and assess

  • 8/14/2019 STF-4 Business Continuity.pdf

    17/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 17

    2006 EMC Corporation. All rights reserved. Business Continuity - 17

    Business Impact Analysis (BIA)

    $1800

    $8000

    18000

    $55619

    $55768

    $69517

    Loss p/y

    $400

    $16,000

    $16,000

    $279,098

    $279,066

    $279,056

    Single Loss

    Expectancy

    1.0

    0.5

    1.0

    0.2

    0.2

    .25

    # Event

    p/y

    $5,000

    $122,000

    $80,000

    $10,000

    $66,456

    $5,800

    Est cost of

    mitigation

    No failover for developmentwebserver

    12IT-Intranet/B2B

    6

    Computer room does nothave sufficient UPScapacity to run on singleunit

    34EntireCompany

    5

    Primary dev platforms donthave failover

    34IT-All4

    Relocate net equip to aseparate physical rack

    15EntireCompany

    3

    Cisco net backbone switchnot redundant

    15EntireCompany

    2

    No redundant UPS forNetworking/phone equip

    15EntireCompany

    1

    High Risk SPOF ItemProbability

    (1-5)

    Impact

    (1 -5)

    Business Area

    Affected

    #

    This is an example of Business Impact Analysis (BIA). The dollar values are arbitrary and are

    used just for illustration. BIA quantifies the impact that an outage will have to the business and

    potential costs associated with the interruption. It helps businesses channel their resources based

    on probability of failure and associated costs.

  • 8/14/2019 STF-4 Business Continuity.pdf

    18/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 18

    2006 EMC Corporation. All rights reserved. Business Continuity - 18

    User & ApplicationClients

    IP

    Identifying Single Points of Failure

    PrimaryNode

    Consider the components in the picture and identify the Single Points of Failure.

  • 8/14/2019 STF-4 Business Continuity.pdf

    19/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 19

    2006 EMC Corporation. All rights reserved. Business Continuity - 19

    HBA Failures

    Configure multiple HBAs, and use

    multi-pathing software Protects against HBA failure

    Can provide improvedperformance (vendordependent)

    HBAHBA

    Host

    Switch

    Storage

    PortPort

    HBAHBA

    Configuring multiple HBAs and using multi-pathing software provides path redundancy. Upon

    detection of a failed HBA, the software can re-drive the I/O through another available path.

  • 8/14/2019 STF-4 Business Continuity.pdf

    20/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 20

    2006 EMC Corporation. All rights reserved. Business Continuity - 20

    Switch/Storage Array Port Failures

    Configure multiple switches

    Make the devices availablevia multiple storage arrayports

    HBAHBA

    HostSwitch

    Storage

    PortPort

    HBAHBA

    PortPort

    This configuration provides switch redundancy as well as protects against storage array port

    failures.

  • 8/14/2019 STF-4 Business Continuity.pdf

    21/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 21

    2006 EMC Corporation. All rights reserved. Business Continuity - 21

    Disk Failures

    Use some level of RAID

    HBAHBA

    HostSwitch

    Storage

    PortPort

    HBAHBA

    PortPort

    As seen earlier, using some level of RAID, such as RAID-1 or RAID-5, will ensure continuous

    operation in the event of disk failures.

  • 8/14/2019 STF-4 Business Continuity.pdf

    22/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 22

    2006 EMC Corporation. All rights reserved. Business Continuity - 22

    Host Failures

    Clustering protects against production host failures

    HBAHBA

    HostSwitch

    Storage

    PortPort

    HBAHBA

    PortPort

    Storage

    Host

    Planning and configuring clusters is a complex task. At a high level:

    A cluster is two or more hosts with access to the same set of storage (array) devices

    Simplest configuration is a two node (host) cluster

    One of the nodes would be the production server while the other would be configured as a

    standby. This configuration is described as Active/Passive.

    Participating nodes exchange heart-beats or keep-alives to inform each other about their

    health.

    In the event of the primary node failure, cluster management software will shift the

    production workload to the standby server.

    Implementation of the cluster failover process is vendor specific.

    A more complex configuration would be to have both the nodes run production workload on

    the same set of devices. Either cluster software or application/database should then provide a

    locking mechanism so that the nodes do not try to update the same areas on disksimultaneously. This would be an Active/Active configuration.

  • 8/14/2019 STF-4 Business Continuity.pdf

    23/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 23

    2006 EMC Corporation. All rights reserved. Business Continuity - 23

    Site/Storage Array Failures

    Remote replication helps protect against either entire site

    or storage array failures

    HBAHBA

    HostSwitch

    Storage

    PortPort

    HBAHBA

    PortPort

    Storage

    Remote replication will be explored in-depth in a later module in this section. What is not shown

    in the picture is host connectivity to the storage array in the remote site.

  • 8/14/2019 STF-4 Business Continuity.pdf

    24/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 24

    2006 EMC Corporation. All rights reserved. Business Continuity - 24

    User & ApplicationClients

    IP

    Resolving Single Points of Failure

    PrimaryNode

    IP

    Redundant

    Network

    Keep

    Alive

    Clustering

    Software

    FailoverNode

    Redundant Paths Redundant DisksRAID 1/RAID5

    Redundant

    Site

    This example combines the methods that we have discussed to resolve single points of failure. It

    uses clustering, redundant paths and redundant disks, a redundant site, and a redundant network.

  • 8/14/2019 STF-4 Business Continuity.pdf

    25/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 25

    2006 EMC Corporation. All rights reserved. Business Continuity - 25

    Local Replication

    Data from the production devices is copied over to a set

    of target (replica) devices

    After some time, the replica devices will contain identicaldata as those on the production devices

    Subsequently copying of data can be halted. At this point-in-time, the replica devices can be used independently ofthe production devices

    The replicas can then be used for restore operations in

    the event of data corruption or other eventsAlternatively the data from the replica devices can be

    copied to tape. This off-loads the burden of backup fromthe production devices

    Local replication technologies offer fast and convenient methods for ensuring data availability.

    The different technologies and the uses of replicas for BC/DR operations will be discussed in a

    later module in this section. Typically local replication uses replica disk devices. This greatly

    speeds up the restore process, thus minimizing the RTO. Frequent point-in-time replicas alsohelp in minimizing RPO.

  • 8/14/2019 STF-4 Business Continuity.pdf

    26/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 26

    2006 EMC Corporation. All rights reserved. Business Continuity - 26

    Backup/Restore

    Backup to tape has been the predominant method for

    ensuring data availability and business continuity

    Low cost, high capacity disk drives are now being usedfor backup to disk. This considerably speeds up thebackup and the restore process

    Frequency of backup will be dictated by definedRPO/RTO requirements as well as the rate of change ofdata

    Far from being antiquated, periodic backup is still a widely used method for preserving copies of

    data. In the event of data loss due to corruption or other events, data can be restored up to the

    last backup. Evolving technologies now permit faster backups to disks. Magnetic tape drive

    speeds and capacities are also continually being enhanced. The various backup paradigms andthe role of backup in B-C/D-R planning will be discussed in detail later in this section.

  • 8/14/2019 STF-4 Business Continuity.pdf

    27/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 27

    2006 EMC Corporation. All rights reserved. Business Continuity - 27

    Module Summary

    Key points covered in this module:

    Importance of Business Continuity

    Types of outages and their impact to businesses

    Business Continuity Planning and Disaster Recovery

    Definitions of RPO and RTO

    Difference between Disaster Recovery and Disaster

    Restart Identifying and eliminating Single Points of Failure

    These are the key points covered in this module. Please take a moment to review them.

  • 8/14/2019 STF-4 Business Continuity.pdf

    28/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 28

    2006 EMC Corporation. All rights reserved. Business Continuity - 28

    Backup and Recovery

    Upon completion of this module, you will be able to:

    Describe best practices for planning Backup andRecovery.

    Describe the common media and types of data that arepart of a Backup and Recovery strategy.

    Describe the common Backup and Recovery topologies.

    Describe the Backup and Recovery Process.

    Describe Management considerations for Backup andRecovery.

    This lesson looks at Backup and Recovery. Backup and Recovery are a major part of the

    planning for Business Continuity.

  • 8/14/2019 STF-4 Business Continuity.pdf

    29/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 29

    2006 EMC Corporation. All rights reserved. Business Continuity - 29

    Lesson: Planning for Backup and Recovery

    Upon completion of this lesson, you be able to:

    Define Backup and Recovery.

    Describe common reasons for a Backup and Recoveryplan.

    Describe the business considerations for Backup andRecovery.

    Define RPO and RTO.

    Describe the data considerations for Backup andRecovery

    Describe the planning for Backup and Recovery.

    This lesson provides an overview of the business drivers for backup and recovery and introduces

    some of the common terms used when developing a backup and recovery plan.

  • 8/14/2019 STF-4 Business Continuity.pdf

    30/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 30

    2006 EMC Corporation. All rights reserved. Business Continuity - 30

    What is a Backup?

    Backup is an additional copy of data that can be used for

    restore and recovery purposes.

    The Backup copy is used when the primary copy is lostor corrupted.

    This Backup copy can be created as a:

    Simple copy (there can be one or more copies)

    Mirrored copy (the copy is always updated with whatever is writtento the primary copy.)

    A Backup is a copy of the online data that resides on primary storage. The backup copy is

    created and retained for the sole purpose of recovering deleted, broken, or corrupted data on the

    primary disk.

    The backup copy is usually retained over a period of time, depending on the type of the data,

    and on the type of backup. There are three derivatives for backup: disaster recovery, Archival,

    and operational backup. We will review them in more detail, on the next slide.

    The data that is backed up may be on such media as disk or tape, depending on the backup

    derivative the customer is targeting. For example, backing up to disk may be more efficient than

    tape in operational backup environments.

  • 8/14/2019 STF-4 Business Continuity.pdf

    31/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 31

    2006 EMC Corporation. All rights reserved. Business Continuity - 31

    Backup and Recovery Strategies

    Several choices are available to get the data to the backup

    media such as:

    Copy the data.

    Mirror (or snapshot) then copy.

    Remote backup.

    Copy then duplicate or remote copy.

    Several choices are available to get the data written to the backup media.

    You can simply copy the data from the primary storage to the secondary storage (disk or

    tape), onsite. This is a simple strategy, easily implemented, but impacts the production

    server where the data is located, since it will use the servers resources. This may be

    tolerated on some applications, but not high demand ones.

    To avoid an impact on the production application, and to perform serverless backups, you

    can mirror (or snap) a production volume. For example, you can mount it on a separate

    server and then copy it to the backup media (disk or tape). This option will completely free

    up the production server, with the added infrastructure cost associated with additional

    resources.

    Remote Backup, can be used to comply with offsite requirements. A copy from the primary

    storage is done directly to the backup media that is sitting on another site. The backup media

    can be a real library, a virtual library or even a remote filesystem.

    You can do a copy to a first set of backup media, which will be kept onsite for operationalrestore requirements, and then duplicate it to another set of media for offsite purposes. To

    simplify thr procedure, you can replicate it to an offsite location to remove any manual

    procedures associated with moving the backup media to another site.

  • 8/14/2019 STF-4 Business Continuity.pdf

    32/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 32

    2006 EMC Corporation. All rights reserved. Business Continuity - 32

    Its All About Recovery!

    Businesses back up their data to enable its recovery in

    case of potential loss.

    Businesses also back up their data to comply withregulatory requirements.

    Types of backup derivatives:

    Disaster Recovery

    Archival

    Operational

    There are three different Backup derivatives:

    Disaster Recovery addresses the requirement to be able to restore all, or a large part of, an IT

    infrastructure in the event of a major disaster.Archival is a common requirement used to preserve transaction records, email, and other

    business work products for regulatory compliance. The regulations could be internal,

    governmental, or perhaps derived from specific industry requirements.

    Operational is typically the collection of data for the eventual purpose of restoring, at some

    point in the future, data that has become lost or corrupted.

  • 8/14/2019 STF-4 Business Continuity.pdf

    33/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 33

    2006 EMC Corporation. All rights reserved. Business Continuity - 33

    Reasons for a Backup Plan

    Hardware Failures

    Human Factors

    Application Failures

    Security Breaches

    Disasters

    Regulatory and Business Requirements

    Reasons for a backup plan include:

    Physical damage to a storage element (such as a disk) that can result in data loss.

    People make mistakes and unhappy employees or external hackers may breach security and

    maliciously destroy data.

    Software failures can destroy or lose data and viruses can destroy data, impact data integrity,

    and halt key operations.

    Physical security breaches can destroy equipment that contains data and applications.

    Natural disasters and other events such as earthquakes, lightning strikes, floods, tornados,

    hurricanes, accidents, chemical spills, and power grid failures can cause not only the loss of

    data but also the loss of an entire computer facility. Offsite data storage is often justified to

    protect a business from these types of events.

    Government regulations may require certain data to be kept for extended timeframes.

    Corporations may establish their own extended retention policies for intellectual property toprotect them against litigation. The regulations and business requirements that drive data as

    an archive generally require data to be retained at an offsite location.

  • 8/14/2019 STF-4 Business Continuity.pdf

    34/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 34

    2006 EMC Corporation. All rights reserved. Business Continuity - 34

    How does Backup Work?

    Client/Server Relationship

    Server

    Directs Operation

    Maintains the Backup Catalog

    Client

    Gathers Data for Backup (a backup client sends backup data to abackup server or storage node).

    Storage Node

    Backup products vary, but they do have some common characteristics. The basic architecture of

    a backup system is client-server, with a backup server and some number of backup clients or

    agents. The backup server directs the operations and owns the backup catalog (the information

    about the backup). The catalog contains the table-of-contents for the data set. It also containsinformation about the backup session itself.

    The backup server depends on the backup client to gather the data to be backed up. The backup

    client can be local or it can reside on another system, presumably to backup the data visible to

    that system. A backup server receives backup metadata from backup clients to perform its

    activities.

    There is another component called a storage node. The storage node is the entity responsible for

    writing the data set to the backup device. Typically there is a storage node packaged with the

    backup server and the backup device is attached directly to the backup servers host platform.

    Storage nodes play an important role in backup planning as it can be used to consolidate backup

    servers.

  • 8/14/2019 STF-4 Business Continuity.pdf

    35/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 35

    2006 EMC Corporation. All rights reserved. Business Continuity - 35

    How does Backup Work?

    DiskStorage

    TapeBackup

    Data SetMetadata

    Catalog

    Backup Server& Storage Node

    Servers

    Clients

    The following represents a typical Backup process:

    The Backup Server initiates the backup process (starts the backup application).

    The Backup Server sends a request to a server to send me your data.

    The server sends the data to the Backup Server and/or Storage Node.

    The Storage Node sends the data to the tape storage device and the Backup Server begins

    building the catalog (metadata) of the backup session.

    When all of the data has been transferred from the server to the Backup Server, the Backup

    Server writes the catalog to a disk file and closes the connection to the tape device.

  • 8/14/2019 STF-4 Business Continuity.pdf

    36/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 36

    2006 EMC Corporation. All rights reserved. Business Continuity - 36

    Business Considerations

    Customer business needs determine:

    What are the restore requirements RPO & RTO?

    Where and when will the restores occur?

    What are the most frequent restore requests?

    Which data needs to be backed up?

    How frequently should data be backed up?

    hourly, daily, weekly, monthly

    How long will it take to backup?

    How many copies to create?

    How long to retain backup copies?

    Some important decisions that need consideration before implementing a Backup/Restore

    solution are shown above. Some examples include:

    The Recovery Point Objective (RPO)

    The Recovery Time Objective (RTO)

    The media type to be used (disk or tape)

    Where and when the restore operations will occur especially if an alternative host will be

    used to receive the restore data.

    When to perform backups.

    The granularity of backups Full, Incremental or cumulative.

    How long to keep the backup for example, some backups need to be retained for 4 years,

    others just for 1 month

    Is it necessary to take copies of the backup or not

  • 8/14/2019 STF-4 Business Continuity.pdf

    37/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 37

    2006 EMC Corporation. All rights reserved. Business Continuity - 37

    Data Considerations: File Characteristics

    Location

    Size

    Number

    Location:

    Many organizations have dozens of heterogeneous platforms that support a complex

    application. Consider a data warehouse where data from many sources is fed into the

    warehouse. When this scenario is viewed as The Data Warehouse Application, it easily

    fits this model. Some of the issues are:

    How the backups for subsets of the data are synchronized

    How these applications are restored

    Size:

    Backing up a large amount of data that consists of a few big files may have less system

    overhead than backing up a large number of small files. If a file system contains millions of

    small files, the very nature of searching the file system structures for changed files can take

    hours, since the entire file structure is searched.

    Number: a file system containing one million files with a ten-percent daily change rate will

    potentially have to create 100,000 entries in the backup catalog. This brings up other issues

    such as:

    How a massive file system search impacts the system

    Search time/Media impact

    Is there an impact on tape start/stop processing?

  • 8/14/2019 STF-4 Business Continuity.pdf

    38/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 38

    2006 EMC Corporation. All rights reserved. Business Continuity - 38

    Data Considerations: Data Compression

    Compressibility depends on the data type, for example:

    Application binaries do not compress well.

    Text compresses well.

    JPEG/ZIP files are already compressed and expand ifcompressed again.

    Many backup devices such as tape drives, have built-in hardware compression technologies. To

    effectively use these technologies, it is important to understand the characteristics of the data.

    Some data, such as application binaries, do not compress well. Text data can compress very

    well, while other data, such as JPEG and ZIP files, are already compressed.

  • 8/14/2019 STF-4 Business Continuity.pdf

    39/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 39

    2006 EMC Corporation. All rights reserved. Business Continuity - 39

    Data Considerations: Retention Periods

    Operational

    Data sets on primary media (disk) up to the point where most restorerequests are satisfied, then moved to secondary storage (tape).

    Disaster Recovery

    Driven by the organizations disaster recovery policy

    Portable media (tapes) sent to an offsite location / vault.

    Replicated over to an offsite location (disk).

    Backed up directly to the offsite location (disk, tape or emulated tape).

    ArchivingDriven by the organizations policy.

    Dictated by regulatory requirements.

    As mentioned before, there are three types of backup models (Operational, Disaster Recovery,

    and Archive). Each can be defined by its retention period. Retention Periods are the length of

    time that a particular version of a dataset is available to be restored.

    Retention periods are driven by the type of recovery the business is trying to achieve:

    For operational restore, data sets could be maintained on a disk primary backup storage

    target for a period of time, where most restore requests are likely to be achieved, and then

    moved to a secondary backup storage target, such as tape, for long term offsite storage.

    For disaster recovery, backups must be done and moved to an offsite location.

    For archiving, requirements usually will be driven by the organizations policy and

    regulatory conformance requirements. Tapes can be used for some applications, but for

    others a more robust and reliable solution, such as disks, may be more appropriate.

  • 8/14/2019 STF-4 Business Continuity.pdf

    40/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 40

    2006 EMC Corporation. All rights reserved. Business Continuity - 40

    Lesson: Summary

    Topics in this lesson included:

    Backup and Recovery definitions and examples.

    Common reasons for Backup and Recovery.

    The business considerations for Backup and Recovery.

    Recovery Point Objectives and Recovery TimeObjectives.

    The data considerations for Backup and Recovery

    The planning for Backup and Recovery.

    In this lesson we reviewed the business and data considerations when planning for Backup and

    Recovery including:

    What is a Backup and Recovery?What is the Backup and Recovery process?

    Business recovery needs

    RPO Recovery point objectives

    RTO Recovery time objectives

    Data characteristics

    Files, compression, retention

  • 8/14/2019 STF-4 Business Continuity.pdf

    41/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 41

    2006 EMC Corporation. All rights reserved. Business Continuity - 41

    Lesson: Backup and Recovery Methods

    Upon completion of this lesson, you be able to:

    Describe Hot and Cold Backups.

    Describe the levels of Backup Granularity.

    Weve discussed the importance and considerations for a Backup Plan, now this lesson provides

    an overview of the different methods for creating a backup set.

  • 8/14/2019 STF-4 Business Continuity.pdf

    42/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 42

    2006 EMC Corporation. All rights reserved. Business Continuity - 42

    Database Backup Methods

    Hot Backup: production is not interrupted.

    Cold Backup: production is interrupted.

    Backup Agents manage the backup of different datatypes such as:

    Structured (such as databases)

    Semi-structured (such as email)

    Unstructured (file systems)

    Backing up databases can occur useing two different methods:

    A Hot backup, which means that the application is still up and running, with users accessing

    it, while backup is taking place.

    A Cold backup, which means that the application will be shut down for the backup to take

    place.

    Most backup applications offer various Backup Agents to do these kinds of operations. There

    will be different agents for different types of data and applications.

  • 8/14/2019 STF-4 Business Continuity.pdf

    43/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 43

    2006 EMC Corporation. All rights reserved. Business Continuity - 43

    Backup Granularity and Levels

    Full Backup

    Cumulative (Differential)

    Incremental

    Full Cumulative Incremental

    The granularity and levels for backups depend on business needs, and, to some extent,

    technological limitations. Some backup strategies define as many as ten levels of backup. IT

    organizations use a combination of these to fulfill their requirements. Most use some

    combination of Full, Cumulative, and Incremental backups.

    A Full backup is a backup of all data on the target volumes, regardless of any changes made to

    the data itself.

    An Incremental backup contains the changes since the last backup, of any type, whichever was

    most recent.

    A Cumulative backup, also known as a Differential backup, is a type of incremental that

    contains changes made to a file since the last full backup.

  • 8/14/2019 STF-4 Business Continuity.pdf

    44/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 44

    2006 EMC Corporation. All rights reserved. Business Continuity - 44

    Files 1, 2, 3, 4, 5

    ProductionProduction

    Restoring an Incremental Backup

    Key Features

    Files that have changed since the last full or incremental backup are

    backed up. Fewest amount of files to be backed up, therefore faster backup and less

    storage space.

    Longer restore because last full and all subsequent incremental backupsmust be applied.

    IncrementalIncremental

    Tuesday

    File 4

    IncrementalIncremental

    Wednesday

    File 3

    IncrementalIncremental

    Thursday

    File 5Files 1, 2, 3

    Monday

    Full BackupFull Backup

    The following is an example of an incremental backup and restore:

    A full backup of the business data is taken on Monday evening. Each day after that, an

    incremental backup is taken. These incremental backups only backup files that are new or thathave changed since the last full or incremental backup.

    On Tuesday, a new file is added, File 4. No other files have changed. Since File 4 is a new file

    added after the previous backup on Monday evening, it will be backed up Tuesday evening.

    On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File

    3 was changed after the previous evening backup (Tuesday), it will be backed up Wednesday

    evening.

    On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was

    added after the previous evening backup, it will be backed up Thursday evening.

    On Friday morning, there is a data corruption, so the data must be restored from tape. The first step is to restore the full backup from Monday evening. Then, every incremental

    backup that was done since the last full backup must be applied, which, in this example,

    means the:

    Tuesday,

    Wednesday, and

    Thursday incremental backups.

  • 8/14/2019 STF-4 Business Continuity.pdf

    45/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 45

    2006 EMC Corporation. All rights reserved. Business Continuity - 45

    Restoring a Cumulative Backup

    Key Features

    More files to be backed up, therefore it takes more time to backupand uses more storage space.

    Much faster restore because only the last full and the last cumulativebackup must be applied.

    Files 1, 2, 3, 4, 5, 6

    ProductionProduction

    CumulativeCumulative

    Tuesday

    File 4Files 1, 2, 3

    Monday

    Full BackupFull Backup CumulativeCumulative

    Wednesday

    Files 4, 5

    CumulativeCumulative

    Thursday

    Files 4, 5, 6

    The following is an example of cumulative backup and restore:

    A full backup of the data is taken on Monday evening. Each day after that, a cumulative backup

    is taken. These cumulative backups backup ALL FILES that have changed since the LASTFULL BACKUP.

    On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full

    backup, it will be backed up Tuesday evening.

    On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added

    or changed since the last full backup, both files will be backed up Wednesday evening.

    On Thursday, File 6 is added. Again, File 4, File 5, and File 6 are files that have been added or

    changed since the last full backup; all three files will be backed up Thursday evening.

    On Friday morning, there is a corruption of the data, so the data must be restored from tape.

    The first step is to restore the full backup from Monday evening.

    Then, only the backup from Thursday evening is restored because it contains all the

    new/changed files from Tuesday, Wednesday, and Thursday.

  • 8/14/2019 STF-4 Business Continuity.pdf

    46/140

  • 8/14/2019 STF-4 Business Continuity.pdf

    47/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 47

    2006 EMC Corporation. All rights reserved. Business Continuity - 47

    Lesson: Backup Archi tecture Topologies

    Upon completion of this lesson, you be able to:

    Describe DAS, LAN, SAN, Mixed topologies.

    Describe backup media considerations.

    We have discussed the importance of the Backup plan and the different methods used when

    creating a backup set. This lesson provides an overview of the different topologies and media

    types that are used to support creating a backup set.

  • 8/14/2019 STF-4 Business Continuity.pdf

    48/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 48

    2006 EMC Corporation. All rights reserved. Business Continuity - 48

    Backup Architecture Topologies

    There are 3 basic backup topologies:

    Direct Attached Based Backup

    LAN Based Backup

    SAN Based Backup

    These topologies can be integrated, forming a mixedtopology

    There are three basic topologies that are used in a backup environment: Direct Attached Based

    Backup, LAN Based Backup, and SAN Based Backup.

    There is also a fourth topology, called Mixed, which is formed when mixing two or more ofthese topologies in a given situation.

  • 8/14/2019 STF-4 Business Continuity.pdf

    49/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 49

    2006 EMC Corporation. All rights reserved. Business Continuity - 49

    Direct Attached Based Backups

    Catalog

    Backup Server

    LAN

    Metadata

    MediaBackupStorage Node

    Data

    Here, the backup data flows directly from the host to be backed up to the tape, without utilizing

    the LAN. In this model, there is no centralized management and it is difficult to grow the

    environment.

    Direct Attached Based Backups are performed directly from the backup clients disk to the

    backup clients tape devices. The advantages and disadvantages are outlined here. The key

    advantage of direct-attached backups is speed. The tape devices can operate at the speed of the

    channels. Direct-attached backups optimize backup and restore speed since the tape devices are

    close to the data source and dedicated to the host. Disadvantages are Direct-attached backups

    impact the host and application performance since backups consume host I/O bandwidth,

    memory, and CPU resources. Direct-attached backups potentially have distance restrictions, if

    short-distance connections such as SCSI are used.

  • 8/14/2019 STF-4 Business Continuity.pdf

    50/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 50

    2006 EMC Corporation. All rights reserved. Business Continuity - 50

    LAN Based Backups

    Backup Server

    LAN

    Metadata

    Storage Node

    Data

    Mail ServerFile ServerDatabase Server

    Metadata

    Data

    In this model, the backup data flows from the host to be backed up to the tape through the LAN.There is centralized management, but there may be an issue with the LAN utilization since alldata goes through it.

    As we have defined previously, Backup Metadata contains information about what has beenbacked up, such as file names, time of backup, size, permissions, ownership, and mostimportantly, tracking information for rapid location and restore. It also indicates where it hasbeen stored, for example, which tape. Data, the contents of files, databases, etc., is the primaryinformation source to be backed up. In a LAN Based Backup, the Backup Server is the centralcontrol point for all backups. The metadata and backup policies reside in the Backup Server.Storage Nodes control backup devices and are controlled by the Backup Server.

    The advantages of LAN Based Backup include the following:

    LAN backups enable an organization to centralize backups and pool tape resources.

    The centralization and pooling can enable standardization of processes, tools, and backupmedia. Centralization of tapes can also improve operational efficiency.

    Disadvantages are:

    The backup process has an impact on production systems, the client network, and theapplications.

    It consumes CPU, I/O bandwidth, LAN bandwidth, and memory.

    In order to maintain finite backup points, applications might have to be halted and databasesshut down.

  • 8/14/2019 STF-4 Business Continuity.pdf

    51/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 51

    2006 EMC Corporation. All rights reserved. Business Continuity - 51

    SAN Based Backups (LAN Free)

    LAN

    Metadata

    Storage Node

    Data

    Mail Server

    SAN

    Backup Server

    Data

    A SAN based backup, also known as LAN Free backup, is achieved when there is no backup

    data movement over the LAN. In this case, all backup data travels through a SAN to the

    destination backup device.

    This type of backup still requires network connectivity from the Storage Node to the Backup

    Server, since metadata always has to travel through the LAN.

    LAN-free backups use Storage Area Networks (SANs) to move backup data rapidly and reliably.

    The SAN is usually used in conjunction with backup software that supports tape device sharing.

    A SAN-enabled backup infrastructure introduces these advantages to the backup process. It

    provides Fibre Channel performance, reliability, and distance. It requires fewer processes and

    reduced overhead. It does not use the LAN to move backup data and eliminates or reduces

    dedicated backup servers. Finally, it improves backup and restore performance.

  • 8/14/2019 STF-4 Business Continuity.pdf

    52/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 52

    2006 EMC Corporation. All rights reserved. Business Continuity - 52

    SAN/LAN Mixed Based Backups

    LAN

    Metadata

    Storage Node

    Data

    Mail ServerDatabase Server

    Data

    SAN

    Backup Server

    Data

    A SAN/LAN Mixed Based Backup environment is achieved by using two or more of the

    topologies described in the previous slides. In this example, some servers are SAN based while

    others are LAN based.

  • 8/14/2019 STF-4 Business Continuity.pdf

    53/140

  • 8/14/2019 STF-4 Business Continuity.pdf

    54/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 54

    2006 EMC Corporation. All rights reserved. Business Continuity - 54

    Multiple Streams on Tape Media

    Multiple streams interleaved to achieve higher

    throughput on tape Keeps the tape streaming, for maximum write performance

    Helps prevent tape mechanical failure

    Greatly increases time to restore

    TapeTape

    Data fromStream 1 Data fromStream 2 Data from

    Stream 3

    Tape drive streaming is recommended from all vendors, in order to keep the drive busy. If you

    do not keep the drive busy during the backup process (writing), performance will suffer.

    Multiple streaming helps to improve performance drastically, but it generates one issue as well:

    the backup data becomes interleaved, and thus the recovery times are increased.

  • 8/14/2019 STF-4 Business Continuity.pdf

    55/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 55

    2006 EMC Corporation. All rights reserved. Business Continuity - 55

    Backup to Disk

    Backup to disk minimizes tape in backup environments

    by using disk as the primary destination deviceCost benefits

    No processes changes needed

    Better service levels

    Backup to disk aligns backup strategy to RTO andRPO

    Backup to disk replaces tape and its associated devices, as the primary target for backup, with

    disk. Backup to disk systems offer major advantages over equivalent scale tape systems, in

    terms of capital costs, operating costs, support costs, and quality of service. It can be

    implemented fully on day 1 or over a phased approach.

    While no changes are needed, any number of enhancements to the process, and the services

    provided, are now possible. Backup to disk can be a great enabler. Instead of having tape

    technology drive the business processes, the business goals drive the backup strategy.

  • 8/14/2019 STF-4 Business Continuity.pdf

    56/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 56

    2006 EMC Corporation. All rights reserved. Business Continuity - 56

    Tape versus Disk Restore Comparison

    Typical Scenario: 800 users, 75 MB mailbox

    60 GB database

    Source: EMC Engineering and EMC IT

    *Total time from point of failure to return of service to e-mail users

    56

    0 10 20 30 40 50 60 70 80 90 100 120110

    Recovery Time in Minutes*

    TapeBackup / Restore

    DiskBackup / Restore

    108Minutes

    108Minutes

    24Minutes

    24Minutes

    This example shows a typical recovery scenario using tape and disk. As you can see, recovery

    with disk provides much faster recovery than does recovery with tape.

    This example shows a typical recovery scenario using tape and disk. As you can see, recoverywith disk provides much faster recovery than recovery with tape.

    Keep in mind that this example involves data recovery only. The time it takes to bring the

    application online is a separate matter. Even so, you can see in this example that the benefit was

    a restore roughly five times faster than it would have gone with tape. What you dont see is the

    mitigated risk of media failure, and time saved in not having to locate and load the correct tapes

    before being able to begin the recovery process.

  • 8/14/2019 STF-4 Business Continuity.pdf

    57/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 57

    2006 EMC Corporation. All rights reserved. Business Continuity - 57

    Three Backup / Restore Solutions based on RTO

    Time of last image dictatesthe log playback time

    Larger data sets extend therecovery time (ATA and tape)

    *Total time from point of failure to return of service to e-mail users

    0 10 20 30 40 50 60 70 80 90 100 120110

    Recovery Time in Minutes*

    Backup on tape

    Backup on ATA

    108 Min.108 Min.

    24 Min.24 Min.

    Typical Scenario: 800 users, 75 MB mailbox

    60 GB DB restore time

    500 MB logs log playback

    130

    BCV / Clone

    2 Min.

    41 Minutes

    19 Minutes

    125 Minutes

    17 Min.

    17 Min.

    17 Min.

    Restore time

    Log playback

    The diagram shows typical recovery scenarios using different technical solutions. As you can

    see recovery with Business Continuance Volumes (BCVs) clones provides the quickest recovery

    method.

    It is important to note that using BCV or clones on Disk, enables you to be able to make more

    copies of your data more often. This will improve RPO (the point from which they can recover).

    It will also improve RTO because the log files will be smaller and that will reduce the log

    playback time.

  • 8/14/2019 STF-4 Business Continuity.pdf

    58/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 58

    2006 EMC Corporation. All rights reserved. Business Continuity - 58

    Traditional Backup, Recovery and Archive Approach

    Production environment grows

    Requires constant tuning and data placement tomaintain performance

    Need to add more tier-1 storage

    Backup environment grows

    Backup windows get longer and jobs do not complete

    Restores take longer

    Requires more tape drives and silos to keep up withservice levels

    Archive environment grows Impact flexibility to retrieve content when requested

    Requires more media, adding management cost

    No investment protection for long term retentionrequirements

    BackupProcessBackupProcess

    ArchiveProcessArchiveProcess

    ProductionProduction

    In a traditional approach for backup and archive, businesses take a backup of production.

    Typically backup jobs use weekly full backups and nightly incremental backups. Based on

    business requirements, they will then copy the backup jobs and eject the tapes to have them sent

    offsite, where they will be stored for a specified amount of time.

    The problem with this approach is simple - as the production environment grows, so does the

    backup environment.

  • 8/14/2019 STF-4 Business Continuity.pdf

    59/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 59

    2006 EMC Corporation. All rights reserved. Business Continuity - 59

    Differences Between Backup / Recovery & Archive

    Data typically maintained for

    analysis, value generation, orcompliance

    Data typically overwritten onperiodic basis (e.g., monthly)

    Useful for compliance and shouldtake into account information-retention policy

    Not for regulatory compliancethough some are forced to use

    Typically long-term (months, years,or decades)

    Typically short-term (weeks ormonths)

    Adds operational efficiencies bymoving fixed / unstructured contentout of operational environment

    Improves availability by enablingapplication to be restored to aspecific point in time

    Available for information retrievalUsed for recovery operations

    Primary copy of informationA secondary copy of information

    ArchiveBackup / Recovery

    Backup/Recovery and Archiving support different business and goals. This slide compares and

    contrasts some of the differences that are significant.

  • 8/14/2019 STF-4 Business Continuity.pdf

    60/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 60

    2006 EMC Corporation. All rights reserved. Business Continuity - 60

    New Architecture for Backup, Recovery & Archive

    Understand the environment

    Actively archive valuable information to tieredstorage

    Back up active production information to disk

    Retrieve from archive or recoverfrom backup

    BackupProcessBackupProcess

    ArchiveProcessArchiveProcessProductionProduction

    1

    3

    4

    2

    4

    The recovery process is much more important than the backup process. It is based on the

    appropriate recovery-point objectives (RPOs) and recovery-time objectives (RTOs). The process

    usually drives a decision to have a combination of technologies in place, from online Business

    Continuance Volumes (BCVs), to backup to disk, to backup to tape for long-term, passiveRPOs.

    Archive processes are determined not only by the required retention times, but also by retrieval-

    time service levels and the availability requirements of the information in the archive.

    For both processes, a combination of hardware and software is needed to deliver the appropriate

    service level. The best way to discover the appropriate service level is to classify the data and

    align the business applications with it.

  • 8/14/2019 STF-4 Business Continuity.pdf

    61/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 61

    2006 EMC Corporation. All rights reserved. Business Continuity - 61

    Lesson: Summary

    Topics in this lesson included:

    The DAS, LAN, SAN, and Mixed topologies.

    Backup media considerations.

    This lesson provided an overview of the different topologies and media types that support

    creating a backup set.

  • 8/14/2019 STF-4 Business Continuity.pdf

    62/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 62

    2006 EMC Corporation. All rights reserved. Business Continuity - 62

    Lesson: Managing the Backup Process

    Upon completion of this lesson, you be able to:

    Describe features and functions of commonBackup/Recovery applications.

    Describe the Backup/Recovery process managementconsiderations.

    Describe the importance of the information found inBackup Reports and in the Backup Catalog.

    We have discussed the planning and operations of creating a Backup. This lesson provides an

    overview of Management activities and applications that help manage the Backup and Recovery

    process.

  • 8/14/2019 STF-4 Business Continuity.pdf

    63/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 63

    2006 EMC Corporation. All rights reserved. Business Continuity - 63

    How a Typical Backup Application Works

    Backup clients are grouped and associated with a Backup

    schedule that determines when and which backup type willoccur.

    Groups are associated with Pools, which determine whichbackup media will be used.

    Each backup media has a unique label.

    Information about the backup is written to the Backup Catalogduring and after it completes. The Catalog shows:

    when the Backup was performed, andwhich media was used (label).

    Errors and other information is also written to a log.

    The process for using a Backup application includes the following:

    Backup clients are grouped and associated with a Backup schedule that determines when and

    which backup type will occur.

    Groups are associated with Pools, which determine which backup media will be used. Each

    backup media has a unique label.

    Information about the backup is written to the Backup Catalog during and after it completes.

    The Catalog shows when the Backup was performed, and which media was used (label).

    Errors and other information are also written to a log.

  • 8/14/2019 STF-4 Business Continuity.pdf

    64/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 64

    2006 EMC Corporation. All rights reserved. Business Continuity - 64

    Backup Application User Interfaces

    There are typically two types of user interfaces:

    Command Line Interface CLI

    Graphical User Interfaces GUI

    There are typically two types of user interfaces. With Command Line Interface, CLI, backup

    administrators usually write scripts to automate common tasks, such as sending reports via email.

    Graphical User Interfaces, GUI, controls the backup and restore process, multiple backup

    servers, multiple storage nodes, and multiple platforms/operating systems. It is a single andeasy to use interface that provides the most common (if not all) administrative tasks.

  • 8/14/2019 STF-4 Business Continuity.pdf

    65/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 65

    2006 EMC Corporation. All rights reserved. Business Continuity - 65

    Managing the Backup and Restore Process

    Running the B/R Application: Backup

    The backup administrator configures it to be started, most (if not all)of the times, automatically

    Most backup products offer the ability for the backup client to initiatetheir own backup (usually disabled)

    Running the B/R Application: Restore

    There is usually a separate GUI to manage the restore process

    Information is pulled from the backup catalog when the user isselecting the files to be restored

    Once the selection is finished, the backup server starts reading fromthe required backup media, and the files are sent to the backupclient

    There are common tasks associated with managing a Backup or Restore activity using the B/R

    Application. These include backup and restore. In backup, it configures a backup to be started,

    most (if not all) of the times, automatically, and enables the backup client to initiate its own

    backup (Note: usually this feature is disabled).

    In restore, there is usually a separate GUI to manage the restore process. Information is pulled

    from the backup catalog when the user is selecting the files to be restored. Once the selection is

    finished, the backup server starts reading from the required backup media, and the files are sent

    to the backup client.

  • 8/14/2019 STF-4 Business Continuity.pdf

    66/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 66

    2006 EMC Corporation. All rights reserved. Business Continuity - 66

    Backup Reports

    Backup products also offer reporting features.

    These features rely on the backup catalog and log files.

    Reports are meant to be easy to read and provideimportant information such as:

    Amount of data backed up

    Number of completed backups

    Number of incomplete backups (failed)

    Types of errors that may have occurred

    Additional reports may be available, depending on thebackup software product used.

    Backup products also offer reporting features. These features rely on the backup catalog and log

    files. Reports are meant to be easy to read and provide important information such as amount of

    data backed up, number of completed backups, number of incomplete backups (failed), and

    types of errors that may have occurred. Additional reports may be available, depending on thebackup software product used.

  • 8/14/2019 STF-4 Business Continuity.pdf

    67/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 67

    2006 EMC Corporation. All rights reserved. Business Continuity - 67

    Importance of the Backup Catalog

    As you can see, backup operations strongly rely on the

    backup catalog

    If the catalog is lost, the backup software alone has nomeans to determine where to find a specific file backedup two months ago, for example

    It can be reconstructed, but this usually means that all ofthe backup media (i.e. tapes) have to be read

    Its a good practice to protect the catalog

    By replicating the file system where it resides to a remote locationBy backing it up

    Some backup products have built-in mechanisms toprotect their catalog (such as automatic backup)

    As you can see, backup operations strongly rely on the backup catalog. If the catalog is lost, the

    backup software alone has no means to determine where to find a specific file backed up in the

    past. It can be reconstructed, but this usually means that all of the backup media (i.e. tapes) has

    to be read. Its a good practice to protect the catalog by replicating the file system where itresides, to a remote location, and by backing it up. Some backup products have built-in

    mechanisms to protect their catalog (such as automatic backup).

  • 8/14/2019 STF-4 Business Continuity.pdf

    68/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 68

    2006 EMC Corporation. All rights reserved. Business Continuity - 68

    Lesson: Summary

    Topics in this lesson included:

    The features and functions of common Backup/Recoveryapplications.

    The Backup/Recovery process managementconsiderations.

    The importance of the information found in BackupReports and in the Backup Catalog.

    This lesson provided an overview of Backup and Recovery management activities and tools.

  • 8/14/2019 STF-4 Business Continuity.pdf

    69/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 69

    2006 EMC Corporation. All rights reserved. Business Continuity - 69

    Module Summary

    Key points covered in this module:

    The best practices for planning Backup and Recovery.

    The common media and types of data that are part of aBackup and Recovery strategy.

    The common Backup and Recovery topologies.

    The Backup and Recovery Process.

    Management considerations for Backup and Recovery.

    These are the key points covered in this module. Please take a moment to review them.

  • 8/14/2019 STF-4 Business Continuity.pdf

    70/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 70

    2006 EMC Corporation. All rights reserved. Business Continuity - 70

    Local Replication

    After completing this module you will be able to:

    Discuss replicas and the possible uses of replicas

    Explain consistency considerations when replicating filesystems and databases

    Discuss host and array based replication technologies

    Functionality

    Differences

    ConsiderationsSelecting the appropriate technology

    In this section, we will look at what replication is, technologies used for creating local replicas,

    and things that need to be considered when creating replicas.

  • 8/14/2019 STF-4 Business Continuity.pdf

    71/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 71

    2006 EMC Corporation. All rights reserved. Business Continuity - 71

    What is Replication?

    Replica - An exact copy (in all details)

    Replication - The process of reproducing data

    Original Replica

    REPLICATIONREPLICATION

    Local replication is a technique for ensuring Business Continuity by making exact copies of

    data. With replication, data on the replica will be identical to the data on the original at the

    point-in-time that the replica was created.

    Examples:

    Copy a specific file

    Copy all the data used by a database application

    Copy all the data in a UNIX Volume Group (including underlying logical volumes, file

    systems, etc.)

    Copy data on a storage array to a remote storage array

  • 8/14/2019 STF-4 Business Continuity.pdf

    72/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 72

    2006 EMC Corporation. All rights reserved. Business Continuity - 72

    Possible Uses of Replicas

    Alternate source for backup

    Source for fast recovery

    Decision support

    Testing platform

    Migration

    Replicas can be used to address a number of Business Continuity functions:

    Provide an alternate source for backup to alleviate the impact on production.

    Provide a source for fast recovery to facilitate faster RPO and RTO.

    Decision Support activities such as reporting.

    For example, a company may have a requirement to generate periodic reports. Running

    the reports off of the replicas greatly reduces the burden placed on the production

    volumes. Typically reports would need to be generated once a day or once a week, etc.

    Developing and testing proposed changes to an application or an operating environment.

    For example, the application can be run on an alternate server using the replica volumes

    and any proposed design changes can be tested.

    Data migration.

    Migration can be as simple as moving applications from one server to the next, or as

    complicated as migrating entire data centers from one location to another.

  • 8/14/2019 STF-4 Business Continuity.pdf

    73/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 73

    2006 EMC Corporation. All rights reserved. Business Continuity - 73

    Considerations

    What makes a replica good?

    Recoverability

    Considerations for resuming operations with primary

    Consistency/re-startability

    How is this achieved by various technologies

    Kinds of Replicas

    Point-in-Time (PIT) = finite RPO

    Continuous = zero RPO

    How does the choice of replication technology tie backinto RPO/RTO?

    Key factors to consider with replicas:

    What makes a replica good:

    Recoverability from a failure on the production volumes. The replication technologymust allow for the restoration of data from the replicas to the production and then allowproduction to resume with a minimal RPO an RTO.

    Consistency/re-startability is very important if data on the replicas will be accesseddirectly or if the replicas will be used for restore operations.

    Replicas can either be Point-in-Time (PIT) or continuous:

    Point-in-Time (PIT) - the data on the replica is an identical image of the production atsome specific timestamp

    For example, a replica of a file system is created at 4:00 PM on Monday. This replicawould then be referred to as the Monday 4:00 PM Point-in-Time copy.

    Note: The RPO will be a finite value with any PIT. The RPO will map to the time when the PITwas created to the time when any kind of failure on the production occurred. If there is a failureon the production at 8:00 PM and there is a 4:00 PM PIT available, the RPO would be 4 hours (8 4 = 4). To minimize RPO with PITs, take periodic PITs.

    Continuous replica - the data on the replica is synchronized with the production data atall times.

    The objective with any continuous replication is to reduce the RPO to zero.

  • 8/14/2019 STF-4 Business Continuity.pdf

    74/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 74

    2006 EMC Corporation. All rights reserved. Business Continuity - 74

    Replication of File Systems

    Host

    Apps

    Volume Management

    DBMS Mgmt Utilities

    File System

    Multi-pathing Software

    Device Drivers

    HBA HBA HBA

    Operating System

    Physical Volume

    Buffer

    Most OS file systems buffer data in the host before the data is written to the disk on which the

    file system resides.

    For data consistency on the replica, the host buffers must be flushed prior to the creation ofthe PIT. If the host buffers are not flushed, the data on the replica will not contain the

    information that was buffered on the host.

    Some level of recovery will be necessary

    Note: If the file system is unmounted prior to the creation of the PIT no recovery would be

    needed when accessing data on the replica.

  • 8/14/2019 STF-4 Business Continuity.pdf

    75/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 75

    2006 EMC Corporation. All rights reserved. Business Continuity - 75

    A database application may be spread out over

    numerous files, file systems, and devicesall of whichmust be replicated

    Database replication can be offline or online

    Replication of Database Applications

    LogsData

    Database replication can be offline or online:

    Offline replication takes place when the database and the application are shutdown.

    Online replication takes place when the database and the application are running.

  • 8/14/2019 STF-4 Business Continuity.pdf

    76/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 76

    2006 EMC Corporation. All rights reserved. Business Continuity - 76

    Database: Understanding Consistency

    Databases/Applications maintain integrity by following the

    Dependent Write I/O PrincipleDependent Write: A write I/O that will not be issued by an application

    until a prior related write I/O has completed

    A logical dependency, not a time dependency

    Inherent in all Database Management Systems (DBMS)

    e.g. Page (data) write is dependent write I/O based on a successful logwrite

    Applications can also use this technology

    Necessary for protection against local outagesPower failures create a dependent write consistent image

    A Restart transforms the dependent write consistent to transactionallyconsistent

    i.e. Committed transactions will be recovered, in-flight transactions will bediscarded

    All logging database management systems use the concept of dependent write I/Os to maintain

    integrity. This is the definition of dependent write consistency. Dependent write consistency is

    required for the protection against local power outages, loss of local channel connectivity, or

    storage devices. The logical dependency between I/Os is built into database managementsystems, certain applications, and operating systems.

  • 8/14/2019 STF-4 Business Continuity.pdf

    77/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 77

    2006 EMC Corporation. All rights reserved. Business Continuity - 77

    Database Replication: Transactions

    Data

    Log

    DatabaseApplication

    4 4

    3 3

    2 2

    1 1

    Buffer

    Database applications require that for a transaction to be deemed complete a series of writes

    have to occur in a particular order (Dependent Write I/O), these writes would be recorded on the

    various devices/file systems.

    In this example, steps 1-4 must complete for the transaction to be deemed complete.

    Step 4 is dependent on Step 3 and will occur only if Step 3 is complete

    Step 3 is dependent on Step 2 will occur only if Step 2 is complete

    Step 2 is dependent on Step 1 will occur only if Step 1 is complete

    Steps 1-4 are written to the databases buffer and then to the physical disks.

  • 8/14/2019 STF-4 Business Continuity.pdf

    78/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 78

    2006 EMC Corporation. All rights reserved. Business Continuity - 78

    Database Replication: Consistency

    Data

    Log

    Source Replica

    Consistent

    4 4

    3 3

    2 2

    1 1

    Log

    Data

    Note: In this example, the database is online.

    At the point in time when the replica is created, all the writes to the source devices must be

    captured on the replica devices to ensure data consistency on the replica.

    In this example, steps 1-4 on the source devices must be captured on the replica devices forthe data on the replicas to be consistent.

  • 8/14/2019 STF-4 Business Continuity.pdf

    79/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 79

    2006 EMC Corporation. All rights reserved. Business Continuity - 79

    Database Replication: Consistency

    Data

    Log

    Source Replica

    Inconsistent

    Note: In this example, the database is online.

    4 4

    3 3

    2

    1

    Creating a PIT for multiple devices happens quickly, but not instantaneously.

    Steps 1-4 which are dependent write I/Os have occurred and have been recorded successfully

    on the source devices It is possible that steps 3 and 4 were copied to the replica devices, while steps 1 and 2 were

    not copied.

    In this case, the data on the replica is inconsistent with the data on the source. If a restart

    were to be performed on the replica devices, Step 4 which is available on the replica might

    indicate that a particular transaction is complete, but all the data associated with the

    transaction will be unavailable on the replica making the replica inconsistent.

  • 8/14/2019 STF-4 Business Continuity.pdf

    80/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 80

    2006 EMC Corporation. All rights reserved. Business Continuity - 80

    DatabaseApplication

    (Offline)

    Database Replication: Ensuring Consistency

    Data

    Log

    Source Replica

    Consistent

    Off-line Replication

    If the database is offline orshutdown and then a replica iscreated, the replica will beconsistent

    In many cases, creating an offlinereplica may not be a viable due tothe 24x7 nature of business

    Database replication can be performed with the application offline (i.e., application is shutdown,

    no I/O activity) or online (i.e., while the application is up and running). If the application is

    offline, the replica will be consistent because there is no activity. However, consistency is an

    issue if the database application is replicated while it is up and running.

  • 8/14/2019 STF-4 Business Continuity.pdf

    81/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 81

    2006 EMC Corporation. All rights reserved. Business Continuity - 81

    Online Replication

    Some database applications allowreplication while the application is upand running

    The production database would have tobe put in a state which would allow it tobe replicated while it is active

    Some level of recovery must beperformed on the replica to make thereplica consistent

    Database Replication: Ensuring Consistency

    Data

    Log

    Source Replica

    Inconsistent

    4 4

    3 3

    2

    1

    In the situation shown, Steps 1-4 are dependent write I/Os. The replica is inconsistent because

    Steps 1 & 2 never made it to the replica. To make the database consistent, some level of

    recovery would have to be performed. In this example, this could be done by simply discarding

    the transaction that was represented by Steps 1-4. Many databases are capable of performingsuch recovery tasks.

  • 8/14/2019 STF-4 Business Continuity.pdf

    82/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 82

    2006 EMC Corporation. All rights reserved. Business Continuity - 82

    Database Replication: Ensuring Consistency

    5

    Source Replica

    Consistent

    4 4

    3 3

    2 2

    1 1

    5

    An alternative way to ensure that an online replica is consistent is to:

    Hold I/O to all the devices at the same instant.

    Create the replica.

    Release the I/O.

    Holding I/O is similar to a power failure and most databases have the ability to restart from a

    power failure.

    Note: While holding I/O simultaneously one ensures that the data on the replica is identical to

    that on the source devices, the database application will timeout if I/O is held for too long.

  • 8/14/2019 STF-4 Business Continuity.pdf

    83/140

    Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

    Business Continuity - 83

    2006 EMC Corporation. All rights reserved. Business Continuity - 83

    Tracking Changes After PIT Creation

    At P