Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Disk-Based Backup & Recovery:
Making Sense of Your Options
White Paper
Datalink
September 2007
Abstract: Data and storage requirements are growing at unbelievable rates for businesses
of every type. To help counter this, it’s time for organizations to examine the benefits that
disk storage subsystems can provide in data protection environments. This white paper
details four enhanced data recovery (disk-based backup) architectures and provides guid-
ance on how to determine whether or not those architectures would be an overall fit within
the IT infrastructure. The assessment of a solution’s effectiveness examines whether its
overall benefits balance well against its human, corporate, and financial costs. This white
paper offers in-depth insight on what those benefits and costs are so that IT professionals
are armed with the knowledge required to make sound decisions.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Tape: the traditional approach to data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Why disk makes sense for data recovery now . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
Paper topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Data recovery bottlenecks and their impact on recovery operations . . . . . . . . . . . . . . . . . .2
What is a bottleneck? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
Potential bottleneck areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
Improper assessment of bottlenecks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
Eliminating bottlenecks with enhanced data recovery solutions . . . . . . . . . . . . . . . . . . . . .3
What is enhanced data recovery? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
First analyze the pain points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Can disk play a role? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Will disk completely replace tape? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
The cost of archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
The cost of off-site media storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Disk is just another tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Enhanced Data Recovery Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
The enhanced data recovery continuum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Disk to disk (D2D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
How does D2D work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
Synthetic full backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Operational benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Business value impact of D2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Example of a D2D implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Virtual tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
How does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
Vendor approaches to virtual tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Operational benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
Best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
Business value impact of virtual tape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
Example of a virtual tape implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14
Point-in-time copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Types of point-in-time copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Full image mirroring technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Mirror management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15
Snapshot edits, additions, and disk space requirements . . . . . . . . . . . . . . . . . . . . .16
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
Operational benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
Table of Contents
Best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
Business value impact of point-in-time copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
Example of a point-in-time copy solution implementation . . . . . . . . . . . . . . . . . . .18
Continuous data protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
What is continuous data protection? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
How it works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Operational benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Best fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Business value impact of CDP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Example of CDP implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21
Adding depth to enhanced data recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Data deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22
Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Compromise no longer necessary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24
Understand the bottlenecks and pain points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Role of disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Partnership with Datalink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
Disk-Based Backup & Recovery: Making Sense of Your Options
Introduction
Tape: the traditional approach to data recovery
For decades, organizations have protected critical, electronically stored data
by making a second copy of the data on tape. Since then, incremental
progress has been made in improving this process. Tape drives have grown in
capacity and performance. Software has become more sophisticated, allow-
ing creative ways to back data up faster, and using fewer resources. Tape
libraries using robotic automation have become common, enabling lights-out
backup operations in large environments. Storage area networks (SANs)
enable tremendous scalability with robust performance. Ultimately, however,
these advances have not kept pace with the business expectations placed on
IT organizations, and therefore the market has screamed for enhancements.
The sources of pain come from many areas, but they can be grouped into the
following categories:
• Data growth
• Backup windows
• Recovery time objectives (RTOs)
• Recovery point objectives (RPOs)
• Decentralization of data
• Rising costs and flat budgets
• Compliance requirements and legal discovery needs
Why disk makes sense for data recovery now
While tape-based architectures have long been the primary technology of
choice for backup and recovery, they have not kept pace with the business
expectations placed on IT organizations.
When comparing the costs of raw ATA or Serial ATA drives to those of LTO
tape media, the difference has been more than 10X in favor of tape in the
past. In today’s market, however, the gap has narrowed and low-end disk is
now less than 2X the cost per gigabyte of tape storage. As a result, organiza-
tions are increasingly augmenting tape-based backup and recovery architec-
tures with disk-based solutions, resulting in measurable performance, relia-
bility, and manageability enhancements. It’s important to note though that
tape technology has not stood still, so further reduction in the relative cost
between raw disk capacity and raw tape capacity cannot be counted on to
drive down the overall cost of disk-based systems.
Still, with rapidly declining disk prices and the introduction of low cost, rela-
tively high-performance RAID-protected ATA disk subsystems, the storage
industry is now spinning about enhanced data recovery (disk-based) solu-
tions, which raises these questions:
While tape-based
architectures have
long been the pri-
mary technology of
choice for backup
and recovery, they
have not kept pace
with business exp-
tectations placed
on IT organizations.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 1
How much performance improvement can be realized in data recoveryutilizing disk?Now that the price of disk has made it a practical consideration for datarecovery, is tape needed?
Will recovery operations be simpler with disk?
The answer to those questions is one that nobody wants to hear: It depends.
There are many variables in storage environments; therefore, it requires an
organization to assess its environment and carefully prioritize its business
objectives based on that assessment.
Implementing a data recovery solution that meets business objectives and
SLAs, especially around restoring data, is a tremendous challenge for IT
managers. For example, if an organization is unable to meet its internal and
external SLAs, it could pay a hefty price for this shortcoming. These costs
could include lost revenue, lost productivity, penalties and litigation costs,
and degraded customer and business partner experiences and confidence.
Paper topics
The topics in this paper include:
• Environmental variables (how bottlenecks impact data recovery opera-
tions)
• Enhanced data recovery approaches (how can new technologies be used
to augment tape for data recovery?)
• Enhanced data recovery architectures (the categories, benefits, and best
fit)
Data recovery bottlenecks and their impact on recovery operations
What is a bottleneck?
In its broadest perspective, a bottleneck is the step or process within every
system that imposes a delay on the ultimate throughput of that system. A bot-
tleneck can exist in any area of a system, including people, process, or tech-
nology. All systems, including data recovery systems, contain bottlenecks.
When a bottleneck is removed, another one often appears, or worse, another
one is actually created by improperly addressing the original bottleneck.
When determining if disk technology can be implemented to address data
recovery bottlenecks by improving performance or reliability, it is vital to
fully assess the entire data recovery operation, and accurately pinpoint where
the bottlenecks occur. The goal is to systematically remove bottlenecks until
service level agreements (SLAs) can be met and quality of service (QoS)
goals are achieved.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 2
All systems,
including data
recovery systems,
contain bottle-
necks. When one
is removed,
another one often
appears.
Potential bottleneck areas
Bottlenecks can be found in data recovery operations in the following areas:
• Procedure
• Software capabilities and configuration
• Hardware capabilities and configuration
• LAN performance and saturation levels
• SAN configurations
Improper assessment of bottlenecks
Often bottlenecks are inaccurately diagnosed in one component or area of the
data recovery operation. This leads to an investment to improve the capabili-
ties of the problem component or area. However, due to improper diagnosis,
the upgrade will not yield a net performance improvement because it was not
the cause of the bottleneck.
For example, an organization finds that its tape I/O performance is not ade-
quate during a backup operation, so it upgrades the tape drives. When the
tape drive upgrade does not improve backup performance, the organization
implements a SAN to increase the bandwidth to the drives. Still frustrated,
the organization next replaces the host bus adaptors (HBAs) in the servers.
The performance is still slow so the organization deduces that the issue must
be the backup application. The organization upgrades the application and the
problem still exists.
Then the organization realizes that there are too many files in one directory.
This directory bogs down the file system, preventing the server from sending
data down stream any faster than previously. When the real bottleneck is
fixed though, a new one appears. This new bottleneck might be resolved by
introducing disk into the environment. But even if this were the case, disk as
a backup target clearly would not have addressed the root cause of the prob-
lem (too many files in one directory), even if it provided some symptomatic
relief. For that matter, any one of the introduced fixes could have exacerbat-
ed or further masked the true problem.
This example illustrates the need for a systematic, objective-based approach
to assessing problems that occur in data recovery operations.
Eliminating bottlenecks with enhanced data recovery solutions
What is enhanced data recovery?
Enhanced data recovery is a data backup and recovery architecture that adds
a disk-based storage array (combined with a sophisticated software technolo-
gy) to a traditional tape-only solution. It enables the concept of backing up to
disk and archiving to tape.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 3
Enhanced data
recovery enables
the concept of
backing up to
disk and archiv-
ing to tape.
First analyze the pain points
Before deciding to introduce disk into the data recovery equation, it is impor-
tant to analyze and prioritize objectives based on pain points, and then deter-
mine the desired outcome. Typical objectives for a disk-based enhanced data
recovery project should come from the organization’s pain points:
Data growth. No organization is exempt from the pressures of managing,
storing, and protecting the ever-increasing amounts of data.
Backup window. Backup window is defined as the time available for IT
administrators to slow down or stop production to perform data recovery
operations. In today’s 24x7xforever business, organizations are increasingly
finding that their backup windows have become either extremely compressed
or extinct. They no longer have the luxury of shutting down applications, or
sending out emails to users asking them to wait until further notice to login.
Recovery time objective (RTO). RTO is the goal that the organization sets
for the time it takes to fully restore an application with its data when a failure
or data loss occurs. Business executives measure system downtime in terms
of hours and dollars; IT managers measure it in terms of high blood pressure
and gray hair. Either way, it is no secret that the costs are both extremely
high and on the rise. When a system goes down and perhaps loses its data,
there is an expectation that the system will be recovered within a reasonable
amount of time. Business units and IT staff often debate what might be
deemed reasonable, given the available technology and staff. Usually, when a
RTO is established for a particular application, it represents an ambitious
goal, placing strain on the IT staff. (See Figure 1.)
Recovery point objective (RPO). RPO is defined as the point in time at
which data is recovered. For example, if a backup is performed at midnight,
and the data is used to recover a system at noon the next day, the recovery
point would be midnight. This is a measurement of the data loss between the
last backup and the time of data loss or corruption. The practical approach to
determining RPO for a given application would be to ask these questions:
“How much data can we afford to lose; and what is the investment cost to
protect the data to that level?” To fully establish this, it is important to esti-
mate how much data is generated during an hour of production, and what
would be the cost to recreate this data (if possible) when lost or corrupted.
This information can help determine the risk-adjusted ROI for technologies
to improve the ability to recover data to a more recent point in time. (SeeFigure 1.)
Before introduc-
ing disk into the
equation, it’s
important to ana-
lyze and prioritize
objectives based
on pain points.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 4
Figure 1: The cost of a data protection solution is relative to how quickly an organ-ization needs to restore the data and how much data it can afford to lose.
Decentralization of data. In recent years, storage hardware costs have
declined more rapidly than WAN bandwidth costs. And over long distances,
latency issues may preclude centralization of resources. For these reasons,
companies often decentralize data to provide adequate performance in remote
offices, without requiring the costly bandwidth necessary to run data-inten-
sive applications on centralized storage. This introduces a significant chal-
lenge to provide adequate protection for decentralized data. Many companies
place tape resources at remote sites and designate the most technical person
in that office to take responsibility to maintain the data recovery operation.
This approach is prone to error, which could result in significant data losses.
Rising costs, flat budgets, and static headcount. Virtually all IT organiza-
tions feel the pressure of too much data and not enough resources to manage
and protect it adequately. The expectation is that IT organizations continue to
deliver against SLAs that remain constant for data recovery, while the vol-
ume of data grows rapidly, without adding headcount. While tape provides a
good archive solution for many organizations, it can become an operational
bottleneck in environments with frequent simultaneous backup and recovery
operations. This pressure fuels the cry for data recovery technologies that
allow greater scalability in the data center within constrained IT budgets.
Compliance requirements and legal discovery needs. Data must be secure,
unalterable, and in some cases immediately accessible in order to comply
with a myriad of regulations. In addition, litigation support and e-discovery
have emerged as major IT and business challenges.
Can disk play a role?
Clearly defined and prioritized project objectives based on the organization’s
pain points and an understanding of the performance bottlenecks within a
data recovery environment serve as the foundation for determining whether
disk can play a role in the data recovery operations. Depending on the identi-
fied bottlenecks and the established objectives, disk technologies can be
added to the data recovery equation in many different ways. This can be done
to achieve efficiencies in data recovery operations.
Will disk completely replace tape?
The answer is probably not yet. In most environments, disk will complement,
rather than replace tape in near-term implementations. Tape is a low-cost
alternative to disk in two areas: 1) Economic archive due to its low incre-
mental cost and portability; and 2) Foundational offsite disaster recovery
storage.
The cost of archiving
When organizations evaluate the cost of disk and tape, their comparisons
often focus on the simple cost per gigabyte of storage. They often overlook
the ongoing operational cost with each medium for a given application. For
archival purposes, there can be a great premium in the costs of managing
Depending on the
identified bottle-
necks and the
established objec-
tives, disk tech-
nologies can be
added to the data
recovery equation
in different ways.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 5
disk compared to those for managing offline tape. When measuring cost fac-
tors for data stored on disk, organizations should consider:
• Power consumption costs linked with keeping the spindles turning and
keeping systems cool
• Wear-and-tear on all components of the array
• Data center floor space
• General storage management costs linked to disks
In contrast, archived tape offers:
• Greater portability
• Little power consumption
• Cheaper storage space
• Less management effort over the course of an extended archival life cycle
The cost of off-site media storage
Most organizations have procedures in place to rotate a third set of tapes to
an offsite vault location mainly for use after a disaster, but also in the event
of a double-media failure in the data center. While this procedure could be
deemed expensive compared to not having a disaster recovery plan in place,
it is often economical compared to having live data replicated to a disk array
at a remote hot site. With this approach, the cost of the incremental disk to
manage the process (in lieu of tape) could be nominal. However, the total
cost of this approach would include disk, software, servers, adequate band-
width, remote data center, and management costs.
It is clear that the total cost of this approach can be expensive. This is true
particularly if the capabilities of tape can adequately satisfy the organiza-
tion’s requirement for recovery time and recovery point performance in the
event of a site disaster.
Disk is just another tool
It is evident that disk is not the silver bullet, but merely one more tool that
can be leveraged to improve an organization’s data recovery operations. If it
is determined that disk should play a role, the next step would be to architect
a solution that applies the right technologies to resolve the bottlenecks and to
satisfy the data recovery pain points that the organization is experiencing.
Enhanced Data Recovery Architectures
Categories
Datalink segments enhanced data recovery architectures into four categories:
1. Backup to disk, which is sometimes referred to as D2D (disk-to-disk) or
D2D2T (disk-to-disk-to-tape).
2. Virtual tape, where software is used to present a virtual interface to disk,
making it appear to backup applications as a tape device.
It’s evident that
disk is not the sil-
ver bullet, but
merely one more
tool that can be
leveraged to
improve data
recovery opera-
tions.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 6
3. Point-in-time copy, where an image of the data, such as a full mirror or
pointer-based snapshot of the data is stored on a RAID subsystem.
4. Continuous data protection, which features the ongoing sequential capture
of each data write transaction committed to a production volume. These
transactions are held on a secondary disk volume with a logging feature
that allows rolling back to a desired point in time.
The enhanced data recovery continuum
At a very basic level, data is protected by making copies of the production
data. The method used to make these copies determines the level at which the
data is protected as well as the costs required to reach that level. The
enhanced data recovery continuum illustrates the range of methods available
to deliver increasing levels of data protection. How quickly a copy can be
made determines how often the copies occur, which dictates the maximum
potential data loss. More specifically, all transaction records that are captured
after the last copy was made are potentially lost if a disruption occurs.
Because of budget limitations, most enterprises are not able to deploy the
most sophisticated data protection solution available. Firms need to define an
RPO and RTO for each of their applications and then select the most appro-
priate technology from a continuum of data protection technologies — both
disk and tape.
Disk to disk (D2D)
How does D2D work?
When backing up to disk, enterprise backup software directs data from server
clients to a disk-based storage array. The data is stored on the disk before
being staged or cloned to a tape library for secondary protection. Backing up
Calculating ROI for a Tape Consolidation Project: Five Key Steps
Because of budg-
et limitations,
most enterprises
are not able to
deploy the most
sophisticated data
protection solu-
tion available.
2007 Datalink. All Rights Reserved. www.datalink.com 7
Conventional Disk
Virtual Tape Library
Point-in-Time Copy
Continuous Data Protection
Data DeduplicationReplication
Cost of Protection
Cost of Lost Data
Cost of Recovery
Cost of Time
days hours seconds seconds hours days
$$$$$$ $$
$$$ $ $ $$$
Recovery Point Recovery Time
FailureOccurs
Enhanced Data Recovery Continuum
Figure 2: Organizations implement data protection technologies from somewhere on thecon tinuum shown in the arrows above.
to disk benefits from the random access nature of disks, eliminating the slow
sequential seek process of tape, which leads to faster file-level restores and
improved reliability. (See Figure 3.)
Synthetic full backups
Backup to disk can also provide an improved foundation for performing syn-
thetic full backups. Synthetic full backups allow organizations to perform
one full backup, followed by ongoing incremental backups. These backups
are finished by merging the full backup with the incremental backups as a
background activity, resulting in an updated full backup. This stops the need
to perform regular full backups, which can consume tremendous bandwidth
and bring production to a grinding halt.
Synthetic full backups can be done without using disk in a traditional tape
environment, but the impact on resources during image consolidation back-
ups can be significant. The wear and tear on tape and library resources from
loading and unloading the tapes when backup images are regularly merged
into a single set is significant.
Also, tape technology does not allow multiple simultaneous reads or writes
from a single tape or drive so any tape resource being used for synthetic full
backup is unavailable during that period. Disk, however, can provide the
server with online access to data for making merged backup images, mini-
mizing the impact on tape resources. By storing the backup images onto disk,
Backup to disk
can also provide
an improved foun-
dation for per-
forming synthetic
full backups.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 8
Primary Storage
Backup System
Disk Staging
Tape Library
Fast
(Prim
ary)
Rec
over
y
Slo
wer
(Sec
onda
ry) R
ecov
ery
Figure 3: Disk-to-disk recovery is fast and reliable.
these images are available for simultaneous reads for recovery operations
during a synthetic full backup.
Challenges
Disk can yield improvements in a data recovery operation, but there are some
challenges.
First, to fully exploit the capabilities of D2D, it is critical that the backup
application supports the desired functionality. Most commercial backup soft-
ware products were originally designed to use tape as the exclusive backup
medium. For most products, only recent upgrades make it feasible to use disk
as a viable destination for backup data. Also, most file systems used to
address disk were designed for random I/O; this is okay for general file
access and database use, but not optimal for sequential, large-block data
transfers, which is a common characteristic of backup applications. This
introduces a bottleneck and throttles back the performance of the disk sub-
system.
Another limiting factor to performance in a D2D configuration is the way
most file systems reclaim and reuse space on a volume after data has been
deleted. The file system will generally write new blocks of data to the loca-
tions where data has been deleted. This is done so that the blocks become
available, without regard for where they are physically on the disk surface.
This method of writing to disk causes a condition known as fragmentation
and leads to performance loss, which worsens over time as the data on a vol-
ume becomes increasingly more fragmented.
A third challenge is the cost of conventional disk, which has limited its use to
only the most important data and the most recent backups. Acquisition costs
for conventional disk have been high due to the lack of advanced storage-
efficient technologies. However, with the introduction of data deduplication
technologies, the cost factor is significantly less and shows a strong ROI.
Operational benefits
In D2D backup, traditional disk storage—often ATA or SATA based—can be
configured as the target for recovery data. By utilizing disk technology as
part of a data recovery infrastructure, an IT operation can improve its ability
to meet its backup windows, improve RTO, increase reliability, and provide
greater functionality. Although this approach can yield backup performance
benefits in some environments, its primary benefits are backup reliability and
recovery performance. Datalink recommends that a copy of the data be writ-
ten to tape or replicated offsite for disaster recovery and archive purposes.
Best fit
D2D recovery can give an organization a greater ability to meet its RTOs.
This is noticeable when having to recover individual or small file sets on a
regular basis. Depending on what bottlenecks exist in the environment, D2D
can also improve the ability to meet a defined backup window. D2D also has
a reliability advantage over tape, which can lead to some RPO benefit in an
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 9
Although this
approach can
yield backup per-
formance benefits
in some environ-
ments, its primary
benefits are back-
up reliability and
recovery perform-
ance.
environment with a high failure rate on backup jobs. When a tape backup job
fails, the previous evening’s backup becomes the recovery point, exposing up
to 48 hours of potential data loss.
In addition, D2D offers benefits in environments where backup is performed
over low-performing or saturated networks and tape streaming is difficult to
achieve, or where interleaving must be used to achieve streaming tape drives.
In these environments, the data will be accepted by the recovery disk at
whatever speed the data is delivered. This stops the shoe-shining effect (com-
mon to many low-end and mid-range tape drives), thereby greatly improving
system throughput.
Advantages can also be achieved in performance at the high-end of the spec-
trum, where disk offers greater data configuration flexibility on primary stor-
age, such as with database systems. Once the system is optimized, disk can
deliver performance comparable or greater than tape with less ongoing
tweaking and tuning.
Business value impact of D2D
In addition to the operational benefits within IT, organizations experience the
following business benefits associated with the successful implementation of
D2D backup:
- Reduced risk to business viability due to unprotected data as a result of
failed backups
- Tape media cost savings due to reduced tape media consumption
- Higher employee productivity due to shortened downtime in the event
that data needs to be recovered from a backup source
- Improved customer satisfaction as customers experience fewer outages
from the business and their inquiries are answered more quickly because
employees will have improved access to critical data
Example of a D2D implementation
Problem: A medium-sized IT services organization faced an ongoing chal-
lenge of lackluster backup operations performance. Due to complex applica-
tion configurations and overburdened servers, the organization was unable to
deliver data to its tape drives at a fast enough rate to stream the devices,
which caused excessive wear and tear on the tape drives and media, and dra-
matically impacted the overall throughput of the system. Also, the tape dupli-
cation process for disaster recovery was unacceptably slow given the envi-
ronmental limitations. As a workaround, multiplexing was implemented as a
means of aggregating multiple data streams in an effort to stream the tape
drives. While this offered some relief on backup operations, it had negative
ramifications on restore performance due to the greater amount of interleaved
data that needed to be read during a system restoration.
Solution: Careful analysis of this environment identified multiple bottle-
necks that could be addressed to improve the situation. Given a variety of
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 10
Advantages can
also be achieved
at the high-end of
the spectrum,
where disk offers
greater data con-
figuration flexibili-
ty, such as with
database sys-
tems.
cost and implementation process variables, it was determined that the most
cost effective option, and the measure that would also deliver the most bene-
fit, was a D2D architecture. Secondary disk has been implemented as the ini-
tial destination for backup jobs. From that location, backup data is subse-
quently cloned to tape in a process that is controlled by the backup applica-
tion. A copy of backup data is maintained on disk for a period of time where
it is accessible for restore operations.
Results: The implementation of D2D required some minor modification of
the backup environment, but the overall effort was not excessive. The organi-
zation experienced roughly a 60% reduction in the amount of time necessary
to complete a backup and a noticeable decrease in the wear and tear on tape
resources, as the cloning process consistently results in full streaming of the
tape drives. The system was architected so that approximately 90-95% of
recoveries are sourced from disk. Tape recoveries are extremely rare. Disk-
based recoveries occur much more quickly than tape recoveries, which
allows the IT organization to meet its SLAs to the business units. The backup
and recovery operation can be delivered much more consistently and requires
60-70% fewer administrative hours, leading to greater productivity within IT.
Virtual tape
How does it work?
Another approach to introducing disk into the recovery infrastructure is to
use emulation technology as a front-end to the disk system. This presents
disk to the backup application in a way that makes it appear as tape. In many
cases this approach (as shown in Figure 4) is the least disruptive of the four
discussed in this paper.
Another approach
to introducing
disk into the
recovery infrast-
structure is to use
emulation tech-
nology as a front-
end to the disk.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 11
Primary Storage
Backup System
Tape Emulation Disk
Tape Library
Fast
(Prim
ary)
Rec
over
y
Server (S
econdary) Recovery
Figure 4: Virtual tape is typically easy to integrate.
Vendor approaches to virtual tape
Vendors have taken a couple different approaches to bundling this technology
and the role that it can play in the recovery operation. The following items
characterize some of the differentiation in the marketplace:
Integrated Virtual Library Solution: Several traditional storage vendors as
well as new technology companies have integrated software, server, and disk
into a packaged solution. This approach offers a pre-configured solution that
has been integrated and tested with specific backup software and server plat-
forms. It also provides a single point of support for the complete solution
(i.e., one throat to choke).
Software Only: Other virtual tape products come as a software-only solu-
tion. With this approach, the software is integrated on a customer or software
vendor supplied server, with Fibre Channel cards and network connec-
tions.This allows the customer to choose the optimal server architecture for
its data volumes, performance needs, infrastructure, etc. In addition, it pro-
vides the ability to select which open systems disk product best fits the envi-
ronment.
This approach also offers the ability to utilize the newest software functional-
ity available from the software vendor (features that are not yet included in
the integrated VTL solutions). It can also support backup server platforms
that are not yet supported in an integrated solution.
Data Mover or Not: Virtual tape vendors take differing positions on whether
the virtual tape technology should be a passive or active storage device. In
the passive approach, the virtual tape technology allows the backup applica-
tion and server to take responsibility for migrating data from disk to tape for
archival purposes. In the active approach, it can manage this task with or
without direction from the backup server. The benefit of having the backup
server manage this I/O is that seamless command and control is maintained
in the environment and the risk of having metadata and subsequent data
integrity challenges in the backup application is minimized. The advantage of
having the virtual tape product manage this data migration is that the I/O can
occur without passing through the backup application server, which allows
the server to perform its other tasks without being impacted. Current best
practices favor leaving the backup application in charge of all data movement
from disk to tape to simplify management and assure metadata integrity.
Challenges
Virtual tape is a mainstream solution with many vendors offering products
that support this capability. Caution must be taken to ensure that all backup
applications in the environment support this technology. Also, the virtual tape
technology should provide the ability to emulate a tape resource that is
appropriate for the targeted environment.
Several traditional
storage vendors
as well as new
technology com-
panies have inte-
grated software,
server, and disk
into a packaged
solution.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 12
Another factor to consider is that efficient use of tape may be minimized
when moving data from virtual to physical tape, depending on the backup
application and how it leverages compression.
Operational benefits
Virtual tape offers all the same benefits of D2D and addresses some of its
shortcomings.
• It enables seamless integration of disk into the data recovery operation
versus D2D, which generally requires some re-engineering of the data
recovery workflow.
• In some highly complex customized environments, virtual tape makes it
possible to consider using disk as a backup destination, as the effort to
modify all the scripts needed to use standard D2D would be far too dis-
ruptive.
• virtual tape can result in measurable performance gains versus D2D. This
is because it does not have the same file system overhead and disk frag-
mentation typically associated with disk systems. Another reason is it
enables data to be transferred to disk in large blocks, similar to how data
is typically written to tape, rather than small blocks more typically associ-
ated with transfer to standard disk. And since disk systems are random
access devices, the importance of maintaining streaming performance,
which is critical to tape system performance, is eliminated.
Best fit
In assessing the best fit for virtual tape technology, it is important to note if
the technology is being compared to tape or D2D. For this discussion, the
attributes of virtual tape are compared primarily with those of a D2D archi-
tecture. The optimal environment for virtual tape technology is one that has a
considerable level of customization to the data recovery workflow, where the
introduction of standard D2D would be too disruptive.
Other characteristics of an environment that would lead to the introduction of
virtual tape include:
• Intense backup window pressure and a strong requirement for improved
performance, where disk fragmentation of standard D2D would be a con-
sideration
• Preservation of investment in existing backup software and tape systems
• Use of disk and tape in a tiered backup strategy
• Reduction of reliance on physical tape (physical tape archived off-site for
disaster recovery)
• Need for quick restore of recent backups
• Need for additional tape resources to eliminate tape device bottlenecks
The optimal envi-
ronment for virtu-
al tape is one that
has a consider-
able level of cus-
tomization to the
data recovery
workflow.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 13
Business value impact of virtual tape
As a starting point, organizations can expect to receive the same business
benefits as with D2D. In addition to these, companies may also experience:
• Lower IT expense for the initial implementation due to the ease of imple-
mentation relative to D2D
• Improved user access to production data resulting in greater productivity
as a result of backups completing more quickly
• Lower ongoing IT expenses due to simplified file system administration
and centralized volume management
Example of a virtual tape implementation
Problem: A large international food distributor was facing extreme pressure
in its data recovery operations. With over 35 terabytes of production data, the
organization struggled to meet its backup windows and recovery SLAs.
Scheduled full backups were cancelled frequently for production purposes.
Nightly backups were failing at rates of 10 to 15 percent, which required a
full-time system administrator to troubleshoot the failures. Often cloning
operations or recovery operations conflicted with backup jobs. Recoveries
took from hours to days, depending on the data source and the number of lost
or corrupt files.
Backup windows were stretched and the environment required significant
tuning through use of techniques like multiplexing to maintain backup sys-
tem performance.
Solution: The assessment of this environment led to the determination that
traditional tape technology imposed many limitations that caused perform-
ance and reliability problems. Disk-to-disk recovery seemed like the natural
solution to some problems, but due to the customized backup application, the
implementation would be difficult and too disruptive. Also, without compres-
sion capabilities, the amount of disk required to support this application
would be cost prohibitive. Further analysis showed that a virtual tape solu-
tion was the best fit. This approach eliminated the need to modify the cus-
tomized workflow that was continually optimized over the years and provid-
ed file compression capability to minimize the amount of disk space required
to enable the solution.
Results: Backups are consistently completed within the backup window,
with nearly 100 percent reliability. Data recoveries occur within established
SLAs. The implementation was completed with minimal disruption to the
customized workflow; also, the full-time administrator previously dedicated
to troubleshooting failed backups was re-assigned to other proactive storage
projects.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 14
As a starting
point, organiza-
tions can expect
to receive the
same business
benefits as with
D2D.
Point-in-time copies
Types of point-in-time copies
Point-in-time copy software makes a mirrored copy of the production data,
which is split off and assigned to a backup server. This is referred to as “off
host” or “zero impact” backup since the application server is not responsible
for performing the backup. Two types of copies exist: full image and pointer-
based copies. Full image mirrors are complete block-level copies of the origi-
nal data. Pointer-based mirrors are copies of the index information identify-
ing where data exists on the storage array.
Full image mirroring technology
Many storage infrastructures include component redundancy such as disk
mirroring, using RAID 1 to improve the reliability and availability of pri-
mary disk storage subsystems. Often organizations take advantage of mirror-
ing technology to augment their recovery capabilities. By deploying an addi-
tional mirror, this tertiary copy of the data can be separated from the primary
and the first mirror for data recovery operations. This provides a point-in-
time copy of data, which is separate from application and user data. A sepa-
rate server can backup this copy without adversely affecting production on
the application server. (See Figure 5.)
Mirror management
Mirror management can be performed by either hardware or software-based
utilities with similar benefits to data recovery operations. One advantage of
using software-based volume administration to manage the point-in-time
copy of the data is that inexpensive storage media can be used for second
mirrors. This allows organizations to invest in innovative RAID technology
for production storage, while using less expensive storage subsystems.
Organizations can also use repurposed legacy storage for their second mir-
rors, where performance and availability are not as critical as with production
storage.
Pointer-based snapshot technology
Pointer-based snapshots are copies of the index information identifying
where the data resides on the storage array. Snapshot technology provides a
means of creating parallel, read-only file systems that point to a set of data
intermingled with live-production data. Creating pointer-based snapshots take
only seconds with minimal impact on the system. There are two general cate-
gories of pointer-based snapshots: file system and storage array.
• File system snapshots are stored as small files on the live file system. The
data that exists at the time of the snapshot is protected from being over-
written on the physical disk, so that it can be referenced from the snap-
shots. This enables consistent static access to files at an identified point in
time, which offers tremendous benefit to data recovery operations.
Point-in-time copy
software makes a
mirrored copy of
the production
data, which is
split off and
assigned to a
backup server.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 15
• Storage array snapshots create a designated area on the LUN where new
data is written in a copy-on-write operation, when data is updated on the
LUN. This allows quick rollback to a point in time, but adds overhead to
the process of updating data on the array, given that one read and two
writes need to occur for each updated block, compared to only one write
in a file system snapshot environment.
Snapshot edits, additions, and disk space requirements
Data edits and additions are written to a new area on the disk, which means
that snapshots do not require nearly the incremental disk space required for
point-in-time data copies (split mirrors), but generally some incremental disk
space is required. The disk space requirement is dependent on two factors:
• The length of time the snapshots are kept
• The data refresh rate
It is important to manage and cycle the snapshots so that the unneeded disk
space can be released and made available to the live file system. (See Figure 6.)
Snapshots and the data recovery process
Snapshots of data can be taken either to create a consistent point of quick
rollback for inadvertent changes, deletions, or corruption of the data or to
establish a solid point-in-time reference to a live data source to assist data
It is important to
manage and cycle
the snapshots so
the unneeded
disk space can be
released and
made available to
the live file system.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 16
Backup Server
Production Server
Mirror
Mirror
Data
Production splits off a mirror copy of the data. The mirror is mounted to a second server, where the data is backed up without impacting the production server. After backup, data can be resynchronized with the primary data volume.
Figure 5: Data mirroring
recovery operations. When snapshots are used as part of the backup, a snap-
shot of the data is taken before the backup process begins. Then the data host
mounts the read-only snapshot file system for backup purposes, while contin-
uing its production use of the live file system. The potential drawback of this
approach is that it creates a lot of I/O activity on the primary volume, which
can impact the performance of a production application. This is an advantage
for full-image point-in-time copies.
For recovery purposes, snapshot file systems can be referenced to recover
files that have been corrupted or inadvertently deleted. In many environ-
ments, snapshot technology is used for up to 90 percent of file recovery,
rather than retrieving the file from tape or other secondary media. This recov-
ery method greatly improves performance, eases administration, and serves
to complement traditional recovery technologies. Snapshots also provide IT
organizations the ability to deliver on the most stringent RPO requirements
related to recovering corrupted or deleted files.
Challenges
One challenge that exists with snapshot technology is the management of file
system and storage array snapshots. These technologies can generate count-
less snapshots. This is good because snapshots offer such a dramatic RPO
improvement over traditional tape backup, but they do not offer a good cen-
tralized method of systematically generating, managing, and releasing the
snapshots.
This seems to be a new battleground for backup software products, as these
vendors recognize the value of incorporating this management intelligence
into the traditional backup application interface. As this capability is incorpo-
rated into the environment, it allows centrally managing snapshots consis-
tently across all storage platforms.
A second challenge is that snapshot implementations can cause a heavy hit
on network performance. Backing up at the byte or block level is substantial-
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 17
Data
Base Data
DataSnapshot
Snapshot Directory
DataSnapshot
Snapshot Directory
Creates snapshot of base data and a snapshot directory with pointer to data at point in time
Writes new blocks, updating base data; points to base blocks in snapshot directory
New blocks will not be overwritten
Directory points to data changed in base data
1 2 31 2 3 1
2
11 3 4
Figure 6: File system data snapshots
For recovery pur-
poses, snapshot
file systems can
be referenced to
recover files that
have been cor-
rupted or inadver-
tently deleted.
ly more efficient versus copying an entire file. An optimal solution makes
light demands on performance and works not only for remote offices but for
the data center as well.
Operational benefits
This solution offers very fast full data recoveries, a decrease in network traf-
fic, and backup times that are reduced to minutes. It reduces the organiza-
tion’s recovery time capability to hours versus days and helps consolidate
backup software licenses.
Best fit
Off-host backup provides a method for delivering quality, predictable, point-
in-time data protection, using standard data recovery technologies. Given the
declining cost per megabyte of disk storage, this approach is gaining popular-
ity as a way of performing backups without hindering production operations
in environments with shrinking or non-existent backup windows.
Business value impact of point-in-time copies
Point-in-time copies deliver the following business benefits:
• Reduced risk of lost intellectual property due to data loss, given the capa-
bility to generate frequent snapshots
• Higher employee productivity due to shortened downtime in the event
that data needs to be recovered from a backup source due to fast accessi-
bility of point-in-time snapshot data
• Reduced IT labor costs as less time is spent recovering data given the fast
nature of snapshot recovery
• Improved customer satisfaction as customers experience fewer outages
from the business, and their inquiries are answered more quickly on aver-
age given that employees will have improved access to critical data
Example of a point-in-time copy solution implementation
Problem: A large organization ran its production systems on servers with
direct-attached enterprise RAID storage subsystems. The data recovery
method was to simply attach tape devices to each file server. The total file
storage was roughly 20TB. A full backup cycle could take up to 40 hours and
nightly incremental backups would consistently extend into the next business
day. The environment lacked high availability; often, recovery from outages
and data losses would take days due to high capacity and slow, legacy tape
technology.
Solution: A high availability, high performance, well-protected solution was
installed to address these problems. The solution comprises file servers (as
before), but in a clustered configuration, redundantly attached to two enter-
prise RAID subsystems with Fibre Channel and Serial ATA drives. The Fibre
Channel drives house the production file shares while ATA storage provides
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 18
Off-host backup
delivers quality,
predictable point-
in-time data pro-
tection using
standard data
recovery tech-
nologies.
mirrored copies that are systematically split from the production data for off
host backup. For tape backup, a group of servers running data protection
software was installed. Every night, these servers get a copy of the produc-
tion data from FC LUNs. Clone copies are written to ATA RAID LUNs. The
servers mount the ATA volumes containing a replica of the production data.
Backup servers write that data via a FC SAN to an enterprise tape library
subsystem. The two tape copy writes of the data are written simultaneously
for immediate offsite storage. Mirror copies remain mounted on the backup
servers until the next backup cycle, which provides for direct network-share
recovery and a test platform for patches, software, etc.
Results: Full backups have gone from 40 hours to just seconds, in terms of
the time that production is impacted. The high availability capabilities have
significantly reduced service outages and the risk of data loss. If a data loss
were to occur, an exact replica of production data would be ready for full
volume-level recovery. Primary recovery has gone from slow, outdated tape
technology to a simple network share via the backup servers, housing the
previous day’s data. Users simply drag and drop files or folders as a means
of recovery. Volume recovery has gone from roughly 10 to 12 hours to min-
utes with mirror reverse synchronization and four to six hours if tape needs
to be used.
Continuous data protection
What is continuous data protection?
Continuous data protection (CDP) is a relatively new, emerging technology
designed to continuously capture or track data modifications and store
changes independently of the primary data, enabling recovery points from
any point in the past. In effect, CDP creates an electronic journal of complete
storage snapshots, one storage snapshot for every instant in time that data
modifications occur. A major advantage of CDP is that it preserves a record
of every transaction that takes place in the enterprise. CDP can support RPOs
of zero as well as protect against data corruption errors because data can be
rolled back to the exact instant before the error occurred.
The working definition for CDP from the Storage Networking Industry
Association (SNIA) CDP Special Interest Group is “a methodology that con-
tinuously captures or tracks data modifications and stores changes independ-
ently of the primary data, enabling recovery points from any point in the
past. CDP systems may be block, file, or application-based and can provide
fine granularities of restorable objects to infinitely variable recovery points.”
How it works
CDP is a front-end protection system that is always on, operating unobtru-
sively to enterprise applications. A typical CDP system is based on a disk
storage infrastructure to log the continuous data changes as well as provide a
time-indexed view into historic points in time. As such, CDP systems may
CDP is a front-end
protection system
that is always on,
operating unob-
trusively to enter-
prise applica-
tions.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 19
require additional processing resources depending on the approach. In return,
CDP can provide enterprise IT organizations with seamless, near-instanta-
neous recoveries from logical and physical data corruption events stemming
from many sources, including operator errors. Recovery can be in seconds or
minutes rather than the hours that a traditional application and data restora-
tion operation may entail.
Challenges
The challenges with implementing a CDP solution include:
• Managing complexity. Because of the number of heterogeneous products
often involved, it is important to select a CDP solution that minimizes the
number of interfaces data administrators must master. Otherwise, the vol-
ume of point product interfaces involved will rapidly transform simplified
environments into ones that are vastly more complex.
• Designing reliability and availability. With the aggressive RPO require-
ments, many factors must be considered and architected into the solution.
These include dual paths, ensuring read/write acknowledgements, asyn-
chronous versus synchronous replication, etc.
• Changing business practices. Often a CDP solution requires IT and busi-
ness process changes to support the new technology or empower end-
users to restore files.
• Maintaining financial responsibility for a CDP solution. IT must be dili-
gent in selecting which data to protect. While it would be ideal to protect
all data through a CDP solution, it is often not fiscally practical, so the
EDR continuum must be leverage to select the most appropriate technolo-
gy for different applications based on the RPO and RTO requirements.
CDP is a front-end
protection system
that is always on,
operating unob-
trusively to enter-
prise applica-
tions.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 20
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
App
licat
ion
Ser
ver
SAN
Source Vol
Target VolJournal
Storage Array Site 1
Storage Array Site 2
Source Vol
Journal
Instant restore to last written change
Copy of every change
Site 2 is for site 1 disaster and provides instant failover
Figure 7: Continuous data protection
Operational benefits
Given the real-time nature of transmitting updates, this approach yields the
best possible recovery point, minimizing the amount of data loss in the event
of a disaster. With any CDP implementation, the benefits are:
• Near zero data loss
• Backup window reduced to virtually zero
• Improved network efficiency
Best fit
CDP technologies are largely implemented as components of a business con-
tinuance, disaster recovery, or high availability strategy. When these tech-
nologies are present, they can be leveraged for data recovery purposes by
providing faster recovery times from storage subsystem failures or they can
provide off host backup capabilities.
CDP will best address the local backup and recovery of data associated with
a particular set of critical applications with the following general characteris-
tics:
• A high number of writes, thus data changes frequently
• Need to run continuously (24×7×365) and have a major business impact
when down
• Amount of data (database) is large, thus making activities like traditional
backup difficult and time consuming
• Large transactional systems
• Application with an RTO of seconds/minutes (near zero) and RPO of zero
or near zero (last completed transaction)
Business value impact of CDP
CDP is an emerging technology that enables enterprises to increase their abil-
ity to provide sustained application availability and roll back to specific
point-in times. Business benefits are:
• Less productivity lost in event of data outage due to immediate accessibil-
ity of online replica data
• Continued customer satisfaction in the event of a disaster, when access to
customer data can be nearly seamless
• IT cost savings by allowing end users to restore their own files
Example of CDP implementation
Following represents a possible scenario where CDP would be utilized.
CDP technologies
are largely imple-
mented as com-
ponents of a busi-
ness continuance,
disaster recovery,
or high availabili-
ty strategy.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 21
Problem: A large Wall Street financial company was encountering problems
with data availability for its customers, brokers, and traders. Specific pain
points were:
• Operations need to provide 24x7x365 access to customer records
• Large volume of data with 85% annual growth
• RTO of less than a minute and RPO near zero
Solution: The financial company engaged a business partner for an assess-
ment of their customer database. The recommended solution was to continu-
ously capture data changes and store them independently of the primary data.
Hardware replication was chosen to minimize the impact of replication I/O
on production processes. At the disaster recovery site, archive copies are
written simultaneously to tape, creating an immediate offsite copy.
Results: The implemented solution has met its objectives; the backup win-
dow has been reduced to virtually zero, data loss has dropped to near zero,
and overall network efficiency has improved. Also, the solution has been
integrated with business continuity and disaster recovery.
Adding depth to enhanced data recovery
Data deduplication
An emerging technology gaining a significant amount of attention is the
elimination of redundant data or data deduplication. It is having a dramatic
impact on enhanced data recovery solutions and spans all four architectures:
D2D, virtual tape, point-in-time copies, and continuous data protection.
Recent product offerings provide more advanced data reduction methods and
promise significantly improved benefits. When considering a consolidated
storage environment, the cost of storing online data and multiple backup or
archive copies consumes a significant amount of IT and data center
resources. If an organization can reduce the amount stored data (data at rest),
then it can reduce the capacity required as well as environmentals such as
floor space and power and cooling to support that capacity. Likewise, if the
unique data can be isolated, the bandwidth required to transmit that data to a
remote site will be significantly less.
The potential to achieve an extremely high ROI with this technology has led
to solutions that are leveraging more efficient deduplication. The most com-
mon solutions being adopted in the market today include:
• Storage application solutions. A business or storage application removes
redundant data prior to storing or transmitting the data. It typically
achieves this by comparing data patterns to data already sent or stored in
a common data repository. Examples of this are backup applications with
granular single instance storage repositories or data replication solutions
that compare source and target data sets and send or store only unique
data patterns.
An emerging
technlogy gaining
a significant
amount of atten-
tion is the elimi-
nation of redun-
dant data or data
deduplication.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 22
• Storage device solutions. A storage device or appliance emulating a stor-
age device eliminates redundant data by comparing all new data objects
to a common data repository of unique patterns. This is done either as
data is being sent to the storage device (inline), or as a background
process to clean up redundant data after it is stored (post-processing). The
most common implementation of data storage deduplication is in backup
solutions that leverage virtual tape appliances. Recently, some advanced
file systems have announced data deduplication solutions intended for use
with online data.
• Storage networking solutions. Storage networking solutions designed to
move large amounts of data over long distances are also adding data
deduplication as a component of their solutions. These solutions also
leverage common data repositories to send references to data that has
been recently sent and compression to aid in efficiently transmitting new
unique data. This is a key feature of most WAN Optimization Controller
solutions.
It can be very complex to properly size and implement data deduplication
solutions, but the benefits can provide dramatic savings in IT resources.
These solutions are gaining traction in many components of enhanced data
recovery solutions.
Again, it is important to understand the application and lifecycle require-
ments to properly integrate the benefits of data deduplication into an
enhanced data recovery solution.
Replication
Another technology that provides greater depth to enhanced data recovery is
data replication. In essense, data replication can serve as the disaster recovery
component of any of the four architechtures highlighted in this paper. Data
replication technology provides the ability to create a secondary copy of data
primarily for disaster recovery. It gives organizations the ability to quickly
recover data at an off site location when a disaster occurs, delivering tremen-
dous recovery time capabilities. Given the real-time nature of transmitting
updates, this approach yields the best possible recovery point, minimizing the
amount of data loss in the event of a disaster. With a wide area high avail-
ability strategy and adequate server resources at a disaster recovery site,
replication technology can help provide the highest level of disaster
resilience in a data center.
While data replication provides benefits generally associated with disaster
recovery, the replica data can be leveraged for recovery purposes as well.
Some organizations perform their regular backups on the replica data, yield-
ing these added benefits:
• Replica copies that can be mounted directly, eliminating recovery steps
• Off-host backup, reducing the impact on production servers during backup
• Reduced network traffic
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 23
While data repli-
cation provides
benefits generally
associated with
DR, the replica
data can be lever-
aged for recovery
purposes as well.
No organization
can cost effective-
ly protect all of its
data assets with
just one
enhanced data
recovery architec-
ture.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 24
• Reduced media requirements
• Solid recovery point and recovery time capabilities
This approach can also be used to centralize the data recovery operations of
several facilities, yielding greater efficiency in data recovery operations.
Replication technologies are largely implemented as components of a busi-
ness continuance, disaster recovery, or high availability strategy. When these
technologies are present, they can be leveraged for data recovery purposes by
providing faster recovery times from storage subsystem failures or they can
provide off-host backup capabilities.
The implementation of replication technology offers the following business
benefits:
• Less productivity lost in event of data outage due to immediate accessibil-
ity of online replica data
• Reduced IT labor costs due to centralization of backup tasks
• Lower tape media costs due to consolidation of tape resources to a central
location
• Continued customer satisfaction in the event of a disaster, when access to
customer data can be nearly seamless
• Improved productivity of business people in branch offices who may have
previously been tasked with managing backup processes
For additional information on data replication, refer to Datalink white paper,
“A Detailed Look at Data Replication Options for Disaster Recovery
Planning.”
Summary
Compromise no longer necessary
No organization can cost effectively protect all of its data assets with just one
enhanced data recovery architecture. Less critical data may require a simple
tape backup for operational recovery while the most critical data may require
a more robust solution like continuous data protection. Companies that use a
single technology to meet the protection needs of multiple data types will
likely see excessive exposure to data loss or excessive costs. The most effec-
tive approach combines the various technologies available into a tiered pro-
tection infrastructure that delivers the most appropriate levels of protection to
data based on its value to the organization. It is no longer necessary for
organizations to compromise – either accepting greater risks than they need
to, or implementing a solution that overprotects data and costs more to
implement than is justified by the value of the data.
It is clear that
disk can play a
significant role as
an enhancement
to tape, but it has
of yet proven to
be a panacea.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 25
Understand the bottlenecks and pain points
Data and storage requirements are growing at unbelievable rates for business-
es of every type. What does this mean for IT managers? It means that the
benefits from disk storage subsystems used in data protection environments
must be assessed against the organization’s data recovery bottlenecks and
pain points. The assessment of a solution’s effectiveness examines whether
its overall benefits balance well against its human, corporate, and financial
costs. IT managers must have clearly defined and prioritized objectives based
on the organization’s pain points and SLAs and an understanding of the per-
formance bottlenecks within a data recovery environment to determine if disk
can play a role.
Role of disk
It is clear that disk can play a significant role as an enhancement to tape in
data recovery operations, but it has of yet proven to be a panacea. Disk is
merely one more tool that can be used to improve an organization’s data
recovery capabilities. If after a detailed assessment, an organization deter-
mines that disk can play a role, the next step then would be to architect a
solution that combines the right technologies.
Technologies
The most attractive EDR technologies are summarized below.
Technology Summary
Disk-to-Disk Recovery A data recovery approach using disk
technology as an intermediary step
before archiving the data to tape
Virtual Tape A disk-based data recovery solution that
emulates a tape device
Point-in-Time Copy A physical or virtual copy of data on disk
that is primarily used as a read-only copy
for data recovery purposes
Continuous Data Protection A continuous record of data modifica-
tions which allow recovery to any prior
point-in-time
Interested in
learning more
about enhanced
data recovery
technologies?
Consider a part-
nership with
Datalink.
Disk-Based Backup & Recovery: Making Sense of Your Options
2007 Datalink. All Rights Reserved. www.datalink.com 26
Partnership with Datalink
Interested in learning more about enhanced data recovery solutions and
whether or not they make sense in your environment? Consider a partnership
with Datalink.
Datalink is a leading information storage architect. We analyze, design,
implement, and support information storage infrastructures that store, protect,
and provide continuous access to information. Datalink’s specialized capabil-
ities and solutions span storage area networks, networked-attached storage,
direct-attached storage, and IP-based storage, using industry-leading hard-
ware, software, and technical services.
AssessLink consulting services help organizations develop information stor-
age strategies and tactics that align with business needs. Datalink storage
consultants conduct comprehensive data research, develop current versus
desired state gap analyses, and make storage infrastructure and practice rec-
ommendations.
For more information, contact Datalink at (800) 448-6314 or datalink.com.