Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
BACKUP OPTIMIZATION ‘NETWORKER INSIDE’Shareef BassiounyEMC
Mohamed SohailEMC
Giovanni GobboSenior IT Consultant
2014 EMC Proven Professional Knowledge Sharing 2
Table of Contents
Executive summary .................................................................................................................... 3
Introduction ................................................................................................................................ 4
Part 1 ......................................................................................................................................... 5
How much Data Storage could be gained? How could it be maximized? ................................ 7
What is the penalty of this gain? ............................................................................................. 8
Classic design example .......................................................................................................... 8
Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC) .........................15
Part II ........................................................................................................................................16
Journey to an optimized backup environment ........................................................................16
The Journey ..........................................................................................................................18
Steps to the solution .................................................................................................................23
NetWorker .............................................................................................................................23
Data Domain .........................................................................................................................25
Avamar ..................................................................................................................................29
“Virtualized Environments” .....................................................................................................31
Appendix ...................................................................................................................................34
Biography ..................................................................................................................................35
Disclaimer: The views, processes, or methodologies published in this article are those of the
authors. They do not necessarily reflect EMC Corporation’s views, processes, or
methodologies.
2014 EMC Proven Professional Knowledge Sharing 3
Executive summary
Do you need to speed up your back up by up to 50%? Do you need to reduce the use of your
bandwidth up to 99%? Do you want to reduce the backup server workload up to 40%? Do you
want to increase your backup success rate?
The answer? Data Domain® Boost (DD Boost) which enables you to finish backups within
backup windows and provide breathing room for data growth. With performance up to 31 TB/hr,
it is 3 times faster than any other solution, enabling you to use your existing network
infrastructure more efficiently.
In this Knowledge Sharing article we illustrate how we optimized our backup processes and
leveraged current resources by integrating NetWorker® backup management software and the
new DD Boost over Fiber Channel feature to enhance backup system performance.
The major component of EMC backup and recovery software solutions, NetWorker is a
cornerstone element in the backup solutions of large infrastructure customers. This article targets
backup administrators, support engineers, and stakeholders interested in the importance of the
DD Boost over Fiber Channel feature and how to use it to enhance backup success rate. The
goal of this article is to help you:
speed up backups
avoid congestion that slows down large critical backups through bandwidth utilization
reduction
minimize workloads on backup hosts (NetWorker server and Storage nodes)
2014 EMC Proven Professional Knowledge Sharing 4
Introduction
In Part 1, we follow a dialogue we had with a customer while promoting Data Domain for his
backup environment, which led us to promote NetWorker as one of the best integrated products
with Data Domain appliances. Part 1 is a series of questions and answers that try to discover
why and how, while we were trying to concentrate on the basic concepts and leave the details to
the referenced documents, primarily “NetWorker and Data Domain Devices Integration Guide”
version 8.1.
Part 2 is the final output of the customer conversation from part 1 coupled with the data that we
had from the customer requirements documents. We then produced a solution proposal that
relied on the concepts we had built in Part 1, along with details on how those products fit into
the customer environment.
2014 EMC Proven Professional Knowledge Sharing 5
Part 1
While deduplicated storage as a backup media target is not a new concept in Backup and
Recovery Solutions architecture (BRSa), the technology used for this deduplicated storage is
one of the major factors that affects backup performance and success rates.
A well-known example is EMC DL3D which integrated multiple storage technologies to achieve
Backup to Disk (B2D) performance through a Virtual Tape Library interface, coupled with a
backend storage deduplication. However, since the deduplication process was running offline,
appliance performance was known to deteriorate beyond 70-80% disk utilization.
Data Domain emerged as cutting-edge technology for deduplicated storage solutions targeting
backup solutions as a backup to disk storage. Its “in-line” deduplication technology (data is
deduplicated before being written to disk, as soon as it reached the storage host), and high
performance made it one of the best-selling products in the EMC Data Protection and
Availability Delivery portfolio. Perhaps the main reason for its market appeal is the sustainable
performance that it delivers (minimal performance degradation beyond 95% utilization) and the
diverse storage connectivity options it provided. Further integration with backup solutions led to
DD Boost, one of the most interesting features provided with Data Domain appliances.
DD Boost is comprised of Distributed Segment Processing (DSP) coupled with DD API. DSP is
a mechanism that enables client-side deduplication to be integrated into virtually any application
that wants to dump data to a secondary storage backup media. DD API is the Data Domain
programming interface that enables applications/hosts to communicate with DD Operating
system (DDOS) in a way that leverages this integration interface to provide more features and
facilities to “boost” performance, minimize backup widow and bandwidth utilization, and
enhance backup success rates.
Basic concepts mentioned in the following discussion include:
Brief Blueprint on Deduplication Technologies
Deduplication and compression have the same aim; to remove redundancies from the data
patterns. While compression scope is file or an archive of files, deduplication scope is a File
System used to store backup data, also called Storage Unit (SU) in Data Domain jargon. Here,
we are not talking about file level deduplication (which hashes the contents of every file on the
file system and thus detects duplicate content and removes the duplicate copies, replacing them
by stub-pointers to the original content).
2014 EMC Proven Professional Knowledge Sharing 6
Figure 1: File-based deduplication
We are talking about sub-file deduplication technology which segments every file using a certain
segmentation algorithm—the most efficient have been found to be variable length
segmentation—into chunks. It is those chunks that are identified by their hash fingerprints, so if
a duplicate chunk is found it is replaced by a pointer to the original chunk (the first one found to
be unique). This is the technology used for Data Domain deduplication, taking into account that
an added layer of compression is applied after new/unique chunks are identified.
2014 EMC Proven Professional Knowledge Sharing 7
Figure 2: Sub-file, variable length chunks deduplication
How much Data Storage could be gained? How could it be maximized?
While deduplication efficiency varies according to different factors, 20x disk space
reductions is typical for plain uncompressed file systems data. The main factors that
affect deduplication efficiency include:
Data type or nature; some types of data are much more compressible (text files,
spreadsheets, etc.) versus other types that are already compressed in nature
(Audio/Video files, graphics) and thus recompressing them will not produce a
significant benefit. As it relies on file segmentation and file-chunks identification,
any change applied on those incoming files (such as compression and/or
encryption) will produce new patterns of chunks—even with minor changes on
those files—and thus reduce the gain from the deduplication operation.
Change rate: Storage savings increase with each subsequent backup of the
save set because a deduplication backup writes to disk only those data blocks
unique to its catalogue; thus, data that have a high change rate will produce
lower gain than data that has a lower change rate.
2014 EMC Proven Professional Knowledge Sharing 8
Data Retention: The amount of time data is intended to be kept available for
recovery affects the size of the data catalogue (imagine that there is a database
of hashes that represent every stored chunk). As such, if you retain the data for
longer period, your catalogue is larger and thus your deduplication efficiency
would increase (as there will be higher probability to find similar chunks).
For more information, check page 27 of “NetWorker and Data Domain Devices
Integration Guide”
What is the penalty of this gain?
While not really a penalty, as with any compression algorithm, uncompressing
(rehydrating) the data consumes time and effort. However, the DD appliance is
engineered to make data rehydration as painless as possible. With DD Boost devices,
concurrent sessions per device may extend up to 60 sessions per DD Boost device
(multiple recoveries will not impair each other, nor will any application supporting parallel
recovery); each session can easily reach 50 MB/s in a good network infrastructure
supporting Gigabit Ethernet. Backup performance becomes very high following the first
full backup (how high depends on the change rate), but recovery performance will be
comparable to the performance of the first full backup because the data will be rebuilt in
its plain format, then sent to the recovering host.
Classic design example
When configuring an environment for Backup to Disk, there are many alternatives in the choice
of the type of target media (local disk, SAN connected, or even NAS attached).
One could settle for the simplest way and export a NAS (Network Attached Storage) file system
(CIFS or NFS as per your client platform preference) to enable a backup to disk target file
system that could be mounted on a backup Server, any of its Storage Nodes (SNs), or even a
Dedicated SN (a client application host that is used as a SN only for its own data; this aims to
optimize LAN access by setting the Application host as a "dedicated" Storage Node (DSN).
Thus, the data goes from the application host to the backup storage directly instead of having to
pass by some generic storage node. This is an optimization configuration that avoids having the
backup data flow (client to SN and then SN to NAS appliance) traverse the LAN twice. It is
needed when the network access is a bottleneck.
2014 EMC Proven Professional Knowledge Sharing 9
A Data Domain host can be configured to export a NAS filesystem. Once your target disk is
ready, configure your disk device. In NetWorker AFTD is the type most used.
Even without backup software, this NAS file system can still be used as a target for Oracle
RMAN backup script or MSSQL backup script, and Data Domain host will deduplicate the
resulting backup files. Meanwhile, DD Boost demonstrates its value-add will by discovering that
backup performance is limited by the network bandwidth.
To tackle network congestion at the target Data Domain host, the Data Domain appliance is
configurable for NIC aggregation for Network connectivity optimization. Network connectivity
optimization through link aggregation on Data Domain host side will certainly help. Still, even
with link aggregation deployed without any problems, there are physical limitations to any LAN
that it cannot bypass.
Different aggregation protocols and hashing methods exist in the Data Domain configuration
option. It is important to mention that Link aggregation is a point-to-point mechanism, not end-
to-end. In other words, it aggregates the switch ports to Data Domain NICs into a single virtual
interface, but the clients are not aware of this mechanism. Details are available in the Data
Domain OS administration guide.
If you do not favor backup to disk for any reason, i.e. it saturates your LAN links or causes LAN
infrastructure constraint, you can use your hosts SAN connectivity to connect to the DD virtual
tape library (VTL). This allows data to travel on the SAN through Fiber Channel (FC)
connectivity without any data transport overhead on the LAN, which is still used in this case but
just for meta-data transport to the backup server.
What if the above options are not enough? What if we have a tighter backup window and
need more optimizations?
DD Boost is the answer. As the size of data targeting the Data Domain host—sent over the wire
as plain data—scales up, duties increase for your LAN and pressures rise on your backup
system especially with more backup-to-disk clients and storage nodes added in your data
center.
In such situations, client-side deduplication or, in Data Domain parlance, Distributed Segment
Processing is your solution, as it enables identification of file chunks to take place on the
deduplication client side (the host that sends data to the Data Domain appliance). Thus, there is
2014 EMC Proven Professional Knowledge Sharing 10
no need to send all plain data on the network; only new chunks need to be sent to the Data
Domain host.
In other words, DD Boost with its DSP feature working through Data Domain API ensures that
the host that is sending data to the Data Domain appliance is not sending redundant data over
the network. Any DD Boost-enabled application will compute the hashes of the chunks of data
that it wants to send to the Data Domain host for storage, then ask the Data Domain host : do
you already have those fingerprints (as an identifier for each file chunk) in your catalogue? If so,
the Data Domain host does not need to receive redundant data; it will just create the pointer. If
not (this is a new data chunk) it is compressed, then sent to the Data Domain host for storage.
Figure 3
In this way, data redundancy checking becomes a mutual effort between the deduplication client
(host sending data using DD Boost functionality) and the Data Domain host appliance, which
optimizes the network usage in exchange for a minimal CPU and memory penalty on the client
side.
Projecting the above concept on to NetWorker operations, we can see that all that is needed is
to transform NetWorker Backup to disk devices to be DD Boost-enabled. Thus, we do not have
to worry about which NAS protocol to use for network file access (DD Boost handles that part
through its native NFS). Even device directory creation is managed through DD Boost as
NetWorker can do that through talking directly to Data Domain OS through the DD Boost API.
Details on DD Boost device creation are found in the “NetWorker and Data Domain Devices
Integration Guide”. Also, migration from Old Tape devices / Backup to Disk devices to DD
Boost-optimized devices is discussed in Chapter 3 of the same document.
2014 EMC Proven Professional Knowledge Sharing 11
The bandwidth and time gain is quite astonishing, as NetWorker adds more usages of the DD
Boost API “client direct” configuration option (added in NetWorker 8.0) enabling backup clients
to send their data “directly” to the Data Domain host instead of having to pass by their
configured Storage Node. This optimizes network usage and accelerates backup execution as
now the clients are not sending any plain data to their SNs on the wire. Though the SN is still
used for meta-data processing, it is not stressed with the data storage efforts which increases
the likelihood for backup success.
Figure 4
This is not the only gain from buying DD Boost, but this is how we choose to introduce an
example on its utility. Two great gains arise from the fact that the backup application can talk to
Data Domain OS and see the deduplication catalogue:
1. Clone Controlled Replication
2. Virtual Synthetic Full
How does DD Boost enhance cloning? Does that include cloning to all type of media?
Cloning is copying a saveset from one storage media to another. A common example is cloning
savesets from disk devices to tape devices for long term retention. Thus, the cloning operation
2014 EMC Proven Professional Knowledge Sharing 12
includes reading the saveset (similar to recovering it) then writing it back to another media
(similar to backup).
The scope of DD Boost cloning is not related to storage media other than Data Domain, which
means if you are cloning your savesets between storage media that includes anything other
than Data Domain hosts, you will be running conventional cloning (recover the saveset from
media A + write it to media B). However, if you are cloning your saveset between two Data
Domain hosts, this is your chance to leverage DD Boost Clone Controlled Replication (CCR).
Figure 5
How does it work?
When both source and target storage pools are Data Domain devices, DD Boost saves backup
system bandwidth, i.e. CPU, memory, and network bandwidth, through the Managed File
Replication (MFR) feature. How this happens is an interesting story; as the cloning operation
reads the saveset that it wants to clone from DD Boost device A on DD host A (like recovering),
it should then write it to DD Boost device B on Data Domain host B as this saveset is stored in
the form of a file (or more). Why not tell Data Domain host A to replicate that file to Data Domain
host B? This would save the effort of reading (which is a rehydration operation) the whole file
and writing it back (which is a dehydration operation) to a different host. Also, bandwidth that will
be utilized to read the plain data saveset can also be preserved, because Data Domain
2014 EMC Proven Professional Knowledge Sharing 13
replication copies only the chunks missing (to construct that file) from source to destination. This
makes CCR a great candidate for cloning to the Disaster Recovery (DR) site, satisfying
legislative requirements to archive your backups offsite for financial auditing, corporate internal
auditing requirements, DR planning, and contingencies without the need to clone to tape.
Figure 6
What advantage does CCR have over the conventional Data Domain replication?
Quite a few. Data Domain conventional replication has three limitations:
1. The backup server will not be aware that cloning took place, so manual intervention will
be needed to create and mount the required device, if recovery from DR is needed.
2. You should not use conventional cloning with DD Boost devices (the replicated devices
cannot be used as a source for further replication). For more information, see Data
Domain native replication considerations of “NetWorker and Data Domain Devices
Integration Guide”
3. There is no way to force different retention on cloned savesets as the backup server is
not aware of replication. Consequently, this cannot be used for long term archiving.
2014 EMC Proven Professional Knowledge Sharing 14
Any other cloning enhancements?
It is important to mention that NetWorker 8.1 added a new enhancement to a cloning operation
called “immediate cloning”. This enables a saveset to be cloned as soon as its backup is done,
as opposed to group cloning that runs a clone process to clone all savesets backed up during
the group run and scheduled cloning that runs clone process in a scheduled manner aside from
the backup run.
For more information on how to configure and run CCR clones, refer to the “NetWorker and
Data Domain Devices Integration Guide”
What is the Virtual Synthetic Full feature added in NetWorker version 8.1? How does it
leverage DD Boost for further optimizations of backup operation?
First, let’s define what a Synthetic Full (SF) backup is: suppose that you need to do a full back
up before rolling out a critical system patch or cumulative update but you don’t have enough
time on your backup window for a full backup. The solution is either to cut the time needed from
production time (typically not an option) or postpone the critical update. This is when SF comes
to the rescue.
SF runs an incremental backup, then uses that incremental and earlier incrementals till the last
full backup to construct a new full backup without actually running a full backup. Hence, the
name, Synthetic Full. Introduced in NetWorker version 8, SF is not supported for NDMP
backups. For a list of SF requirements consult KB article 168411:
https://support.emc.com/kb/169411. Also, more details can be found in the NetWorker
Administration guide.
Figure 7
2014 EMC Proven Professional Knowledge Sharing 15
Virtual Synthetic Full (VSF) backup is a new feature introduced in NetWorker 8.1(it requires DD
Boost 2.6 and DD OS 5.3 or higher). It is the same as a synthetic full backup, except that it is
performed on a single Data Domain system (all full and incremental backups must reside on the
same Data Domain host). Similar to Synthetic Full, VSF uses full and partial backups to create a
new full backup. However, since the backups reside on a Data Domain system, and use the
new DD Boost APIs, the backup does not require saveset data to be sent over the wire (no
need to read the savesets), resulting in improved performance over synthetic full and traditional
backups.
What actually happens is that since NetWorker is constructing a synthetic full from savesets
stored on the Data Domain host, and since DD Boost allows NetWorker to see the file-chunks
catalogue, it does not have to read the savesets off the Data Domain host. Instead, it may use
that catalogue to construct the new SF (or VFS, in this case) without having to read all the
savesets off the Data Domain host and write the new saveset. For more details on VSF backup
execution, refer to the NetWorker Administration guide – page 88.
Does DD Boost and client direct work for module backups as well as for file system
backups?
Yes. Consult your module documentation to confirm that your version has the proper support.
Of course, your client must have direct network access to the Data Domain host.
How many DD Boost devices can I configure on a SN?
You can configure as many as you need. Keep in mind that a single device can accommodate
60 sessions so there should be no problem sharing the same device through multiple SNs as
long as backups directed to this device are targeting the same pool.
Advantages/disadvantages of the new DD Boost over Fibre Channel (DFC)
The pre-requisites are DD OS 5.3 or above, coupled with NetWorker 8.1 or above (the version
that has DD Boost 2.6). DFC-enabled clients and SNs must be zoned to the Data Domain host
HBAs target LUNs that represent the DD Boost devices. For complete deployment procedure,
refer to the DD OS Administration guide and NetWorker 8.1 “NetWorker and Data Domain
Devices Integration Guide”.
DFC is another way to work around an overloaded LAN during the backup window. DFC
enables DD Boost-enabled clients and SNs to access DD Boost devices through SAN,
minimizing bandwidth pressures over LAN to the Data Domain host during the busy backup
2014 EMC Proven Professional Knowledge Sharing 16
window. SAN connected clients and SNs send the data to the DD Boost devices through SAN
partially relieving the LAN for other tasks. Client Direct configuration still applies with DFC
devices; Clients are configurable for the connectivity type required (either IP of FC)
The only disadvantage is a minor performance penalty. When sending data over SAN,
performance has been found lower than LAN DDBoost connectivity by 20% in the worst cases.
It is worth mentioning that the client HBA settings (specifically the HBA queue length) have an
impact on performance as mentioned in this DD article
https://my.datadomain.com/download/kb/all/boostfc_client_qdepth.htm
DFC is currently available for Windows and Linux hosts only, but further platform support is on
the way.
Part II
This is the solution proposal that we developed answering the backup requirements of an
anonymous customer, there are some redundancies some figures and concepts shown below,
but we decided to represent the document in its full length to preserve its integrity.
Journey to an optimized backup environment
EMC is really pride on helping, designing and implementing enterprise-class backup and
recovery solutions, based on powerful, sustainable, and world class level products for its
customers. EMC invests in key infrastructure-related initiatives delivering upon a strategic long-
term vision. With this vision in mind, we recently implemented a consolidation project for a large
customer to leverage their current infrastructure and consolidate their data center with EMC
solutions.
The design put forth aimed to solve these business challenges:
Cost Competitiveness
Highest Levels of Reliability
Ease of Management
High Performance
Compatibility
2014 EMC Proven Professional Knowledge Sharing 17
Cost Competitiveness
Being cost competitive is paramount in order to build and maintain business. Whether facing
economic turmoil or economic boom times, we must ensure that the solutions we offer fit our
customer‘s budgets.
Highest Levels of Reliability
Offering cost-effective solutions is meaningless if the solution is plagued by outages. Backup
infrastructures must be capable of performing even after suffering multiple component failures.
Customer loyalty will be lost if we fail to meet our availability obligations. We strive to offer
products and services that remain operational 24 x Forever.
An added benefit of highly reliable systems is the cost savings realized the longer the systems
remain operational. Systems capable of remaining in production 7 or more years can yield
significant long-term savings and/or profit over those capable of running production workloads
for 3 to 5 years.
Ease of Management
Hand-in-hand with being cost-effective and reliable, systems need to be as automated and easy
to use as possible—being able to do more with less. The more complex the solution, the more
resources it takes to maintain and operate over its lifecycle driving overall cost up while driving
reliability down. Ensuring staffing levels remain stable in the face of unabated growth is
essential in cost containment and is the main reason ease of management remains a key
requirement.
High Performance
The overall solution must be capable of delivering during periods of high usage and must be
designed to eliminate congestion points. Delivering solutions that suffer from poor performance
frustrates customers and wastes precious time and resources tracking down and resolving
performance-related issues.
Compatibility
Gone are the days of implementing independent computing silos. It‘s expensive and difficult to
maintain solutions designed in isolation. To meet aggressive growth objectives, we need to
ensure all of the systems being deployed are compatible and work with one another. Everything
needs to work together and scale in order to keep the overall solution as simple and
manageable as possible.
2014 EMC Proven Professional Knowledge Sharing 18
The Journey
The solution was designed to meet and exceed our customer’s expectations. We emphasized
leveraging their current infrastructure, and enhanced the ability for future upgrades depending
on scalable solutions and powerful products that can support all aspects of the business.
Where we were
The customer has a complex environment; the data center has many topologies for performing
backups.
Current infrastructure
An analysis of the infrastructure uncovered points of possible improvement and the importance
of having a single backup tool for improving management and reducing administration time.
Existing infrastructure
4 backup servers (Data Zones) running with 3 different backup products
o HP Data Protector - main backup infrastructure
o Dell NetVault - NDMP backup DMZ and Fernord
o Symantec Backexec - Trenord site
o Symantec Backupexec - Infrastructure Iseo site
Repository Backup Fujitsu CS800 S2 (backup to disk storage)
Analysis based on data collected from EMC staff for ABC customer.
2014 EMC Proven Professional Knowledge Sharing 19
Legend of the table (which is written in Italian)
Ambiente = Environment
Ambito aziendale= Name of the company
Tipo di backup = type of the backup
Mezzo trasmisivo attuale = method of transmission.
TB giorno = TB of every day, Mese = Month, and Anno = Year.
Points of improvement of the future infrastructure
Shorten the backup of the Exchange infrastructure via LAN-free backup mode.
Shorten the backup of NetApp storage by increasing the number of drives used for
backup.
Shorten the backup of Oracle database through the use of LAN-free backup mode.
Possible reduction of the backup window SAP infrastructure by increasing the number of
drives used by the server.
Availability of a single backup tool: single point of management and unique management
methodology. NetWorker employed.
Increase performance and disk space on the VTL for a project to longer-term, adequate
to support the performance improvements provided above. “Data Domain as a
candidate”.
2014 EMC Proven Professional Knowledge Sharing 20
ABC backup Architecture 1
FAS 2040
FAS 3140
10 Gbe
Networker Server 8.1
City 1
DMZ
SAN TRENSAN FERN
VMWARE38 VM
Bare- Metal
SAP DWH
5 SAP ServersTransazionali AIX
OracleCluster
SAP DWH
SAP DWH Svil
Data DomainWith DD boost
vStore API ProxyNetworker
Server 8.1
LAN TREN
LAN ISEOLAN DMZ
LAN FERN
AGENTI
TSM EE
VE
SAP
DB
LAN
FREE
vStore API Proxy
VMwareFERN
Bare- Metal
VMwareTREN
Bare- Metal
FAS 3140
SAP DWH Svil
OracleWindows
Lupin
Figure 8
2014 EMC Proven Professional Knowledge Sharing 21
ABC Backup architecture 2
10Gbe
Networker Server 8.1
City 2
LANFERN
VMWARE38 VM
VM FERNORD
5 SAP Svil.
ExchangeNor
SAP DWHSvil
NetworkerServer 8.1
LAN ISELAN DMZ
VM TREN5 SAPTest
LANTREN
SQL Server
SQL ServerWin Server
Win Server
Linux ServerLinux Server
AGENTI
VE
SAP
DB
LAN FREE
TSM EE
ExchangeTren
Win ServerWin Server
Oracle Server
Exchange Fern
Figure 9
2014 EMC Proven Professional Knowledge Sharing 22
10Gbe
FAS2030
Networker Server 8.1
SANNovate
City 3
8x FC
LAN City 2
ABC Backup architecture 3
Win Server
AGENTI
VE
SAP
DB
LAN FREE
TSM EE
DDBoost Replica IP bidirectional 1 Gbe
Figure 10
2014 EMC Proven Professional Knowledge Sharing 23
Customer’s challenges vs. solutions
The customer had many challenges in his environment, including:
Load on the LAN
Inefficient backups
Low level integration with virtual environment
Inability to perform tech refresh on fiber network.
Distributed management system for the backup environment
Steps to the solution
NetWorker
The first phase was implementing NetWorker as management software to centralize the
customer’s backup, recovery, and archiving environment. The integration features of NetWorker
enabled us to integrate it with the major components of the backup & recovery environment
(databases and applications servers).
New features we were able to use after implementing NetWorker 8.1 as a central platform for
backup and recovery included:
greater backup efficiencies, spanning integration for EMC Array snapshot
management, to further integrations with Data Domain, and new support for
Block-Based Backup for Windows systems.
optimized support for VMware backup and recovery with a new underlying
VMware Backup
enhanced NetWorker management on several fronts continues to expand
support for enterprise applications with support for new features that maximize
efficiencies
Snapshot management
The customer wished to simplify management of the snapshot and also remove the overhead
components by integrating the solutions together. We used the Integrated snapshot
management feature which enabled us to eliminate the need to have a separate proxy server to
move the snapshots. The administrator now has the ability to use the NetWorker Storage Node
to act as a proxy in the workflow.
2014 EMC Proven Professional Knowledge Sharing 24
Use of snapshots as part of an overall data protection strategy not only enables fast operational
backup and recovery, but also allows backup to disk or tape to happen offline without impact to
the mission critical application server. This process is often referred to as “Live Backup”. Tapes
can be created and sent offsite for disaster recovery purposes. At any time, recovery can be
accomplished from a snapshot or from disk or tape as needed.
NetWorker Snapshot Management will catalog all snapshot activities, enabling quick search and
recovery for restore purposes. NetWorker software provides lifecycle policies for snapshot save
sets. Snapshot policies specify the following:
• Time interval between snapshots
• Maximum number of snapshots retained, above which the older snapshots are
recycled
• Which snapshots will be backed up to traditional storage
• Selecting the type of snap that will be created
• Expiration policy of the snapshot
• Number of active snapshots that will be retained on the storage array
Snapshots for DB2, Oracle, and SAP are also managed via the NetWorker Snapshot
Management feature. Configuration Wizard support for these applications will be added in a
later release.
NetWorker Snapshot Management operations for each NetWorker client can be monitored
through NMC reporting features. Monitored operations cover snapshots that are successfully
created or in progress, as well as snapshots that are mounted, in the process of being rolled
over, and deleted. Reports include details of licensed capacities consumed. NMC also provides
a detailed log of snapshot operations.
Snapshot Management is included with a NetWorker capacity-based license.
The Client Configuration Wizard for the NetWorker Snapshot Management feature enables
automatic discovery of the environment that has been configured for snapshots by the Storage
Administrator. The Wizard accommodates the common NetWorker Snapshot Management
workflows associated with snapshot and rollover configurations.
2014 EMC Proven Professional Knowledge Sharing 25
Snapshot validation will verify whether a backup as configured by the Wizard is likely to be
successful.
Simplify the process
No scripting is required. The Configuration Wizard will ensure that the proper commands are
executed for the associated snapshot operations, that the LUNs are paired appropriately, and
that all NetWorker resources are properly assigned. Basically the Wizard will take care of
configuring, end-to-end, the client snapshot/rollover policy.
Data Domain
DD Boost inside
Thanks to its ease of management and the deeper integration with NetWorker, Data Domain
enabled us to eliminate the need to tape out via the the new Data Domain Boost over Fibre
Channel feature.
Support for the Fibre Channel protocol has now been added to DD Boost and NetWorker 8.1
leverages it for customers who have standardized on Fibre Channel as their backup protocol of
choice. This support not only optimizes the customers’ existing investment in their Fibre
2014 EMC Proven Professional Knowledge Sharing 26
Channel infrastructure, but with DD Boost client-side deduplication, the customer can now enjoy
50% faster backups over their traditional VTL-based model and 2.5x faster recovery.
Data Domain systems reduce the bandwidth required on the network, as well as the disk
capacity required. Since this support offers both client-side deduplication and support of the
Fibre Channel protocol using a backup-to-disk workflow, the old VTL ‘tape-based’ management
can be eliminated. This results in greater reliability and less complexity. This support also
enables for Fibre Channel all the features that Data Domain and DD Boost offer, including
virtual synthetic full backups, clone controlled replication, global deduplication, and more.
DD Boost over Fibre Channel is supported for Windows and Linux environments.
When performing full backups previously, all data had to be sent from the backup server to the
Data Domain system. With DD Boost, only unique data is sent from the backup server or the
client to the DD system. This means up to 99% less data to be moved across the already
loaded network, even for full backup.
This enabled us to use the current infrastructure LAN/SAN resources more efficently. Actually,
when DD Boost can be leveraged at the client level (EMC NetWorker, Avamar®, and Oracle
RMAN), this bandwidth advantage spans the entire backup path all the way from the client to
the Data Domain system.
2014 EMC Proven Professional Knowledge Sharing 27
Figure 11
For our backup enviroment, we were experiencing bandwidth choking during full backup. This
provided significant performance improvements and helped avoid infrastructure upgrades.
Figure 12
Reduce the workload on the backup servers
We were restricted on adding new componenets to the current environment and the solution we
proposed needed to use some componenets of the backup servers already in use.
Though we thought that moving some of the deduplicaiton work from the Data Domain system
to the backup server would negativly impact the backup server, the good news was that was not
the case. This might seem counterintuitive but, as it turns out, sending less data significantily
2014 EMC Proven Professional Knowledge Sharing 28
reduces the load on the server. In other words, it takes fewer CPU cycles to assist with two
steps of the deduplication process than it takes to push full backups over the ethernet.
Virtual Synthetics
Figure 13
Virtual Synthetic Full backups are an out-of-the-box integration with NetWorker, making it ‘self-
aware.’ Therefore, our customer is now using a Data Domain System as their backup target.
NetWorker will use Virtual Synthetic Full backups as the backup workflow by default when a
synthetic full backup is scheduled, thus optimizing incremental backups for file systems.
Virtual synthetics reduce the processing overhead associated with traditional synthetic full
backups by using metadata on the Data Domain system to synthesize a full backup without
moving data across the network. Unlike other vendors, no Storage Node/Media server is
required, and there is no rehydration during the recovery.
In this workflow, a full backup is sent to Data Domain, taking full advantage of Data Domain
value-add features, namely DD Boost. Incremental backups are run daily, as usual, after which
point, instead of initiating a new full backup, another incremental backup would be run, and then
a Virtual Full.
In a Virtual Synthetic Full backup, NetWorker sends commands to the Data Domain System of
what regions are required to create a full backup, but no data is transferred over the network.
Instead, the regions of the full backup are synthesized from the previous full and incrementals
already on the system by using pointers. This process eliminates the data that needs to be
2014 EMC Proven Professional Knowledge Sharing 29
gathered from the file server, reducing system overhead, time to complete the process, and
required network bandwidth.
This workflow is repeated over the following weeks, with a new traditional full backup
recommended only after every 8-10 Virtual Full backups have been completed. Therefore, the
use of Virtual Synthetic Full backups also reduces the number of traditional full backups from 52
to 6 per year – up to 90% reduction in full backups annually.
Avamar
As we have illustrated in the initial diagram of the customer’s environment, customer needed to
add two remote sites to support the business; thus also needing to back up these sites. We
thought about the best solutions that can support this new structure without requiring major
changes in the network design.
We suggested integrating the new features available in the powerful capabilities of DD Boost
under the umbrella of NetWorker.
Here is the design we proposed to support all the data center activities.
Figure 14
Our proposal offers the maximum benefits of the marriage between Avamar and Data Domain,
where Avamar clients send the data directly to the Data Domain system. Specifically, this
integration will provide the Data Domain system scalability and performance advantages for the
most challenging backup workloads including VMware image backups, NDMP, FS, and
2014 EMC Proven Professional Knowledge Sharing 30
enterprise applications, such as Oracle, DB2, MS Exchange, MS SQL databases, and MS
Sharepoint. This greatly optimizes LAN bandwidth and multiplies the advantage of distributing
some of the deduplication effort to hundreds of clients, improving the performance of the overall
back cycle.
Additionally, DD Boost supports Avamar instant access to the virtual machines stored on the
Data Domain system, which is of great benefit during system restores.
Backing up Oracle and SAP databases
As referenced at the beginning of the article, the customer has Oracle DB and SAP applications.
While we were adding the design we remained sensitive to what needs to be taken into
consideration about these applications. We asked the DBAs about their preferences to back up
their data and found that they preferred to perform a full back up every day and sometimes
more than one full back up each day as per the criticality of the database.
The challenge here is; can the system and the current infrastructure support such workload? In
normal conditions the backup window can take more than 12 hours for a full back up, plus the
data growth over time.
In testing the difference that DD Boost can provide to reduce this issue, we found that the
backup window for full backups can be reduced to 8 hours. Plus, that it can provide DBAs the
ability to administrate everything through the RMAN and eliminate the need to rely upon the
backup administrators. Additionally, this enables DBAs to have a full RMAN catalog of both the
local and DR sites.
2014 EMC Proven Professional Knowledge Sharing 31
Figure 15
“Virtualized Environments”
VMware
Optimizing VMware Backup and Recovery
Integrating with the industry-leading Avamar technology for backup of VMware environments is
a major feature of EMC NetWorker. VMware has chosen Avamar technology to power its
recently announced vSphere Data Protection (VDP) and vSphere Data Protection-Advanced
(VDP-A) support. Now, that same technology has been leveraged in NetWorker, thus enabling
Change Block Tracking technology for both backup and recovery of data, as well as a multi-
streaming centralized proxy that will also load balance jobs between proxy servers for increased
VM backup performance, and many other features. Since the backup includes all the changed
blocks, every backup is essentially always a full backup.
NetWorker uses a software-based VMware Backup Appliance (VBA). The VBA stores the
metadata, sending changed blocks during the backup workflow to a Data Domain System
target. This support is specific to, and optimized by, Data Domain. Therefore, the customer
enjoys all the features and value from a Data Domain solution including DD Boost support,
clone to tape for retention and compliance, and global deduplication, to name a few. Each VBA
is capable of protecting hundreds of virtual machines ensuring protection for the largest virtual
environments.
2014 EMC Proven Professional Knowledge Sharing 32
Now our customer has the option to clone from Data Domain to tape or other external media for
extended retention and compliance purposes.
In-guest protection is enabled by NetWorker Modules for application consistency. The new
VMware engine is supported to co-exist with support in NetWorker 8.0 and earlier, primarily for
customers who continue to have a requirement to back up directly to tape using physical
proxies.
Managing VMware backup and recovery
Through direct integration with VMware vCenter, we offered a collaborative approach to backup
management that empowers the VMware Administrator to manage their own backups, while the
Backup (NetWorker) Administrator maintains visibility and control of corporate SLAs through
policy-setting, monitoring, and reporting.
Both VMware and Backup Administrators are empowered with visibility and control of the
environment. Protection is based on policies, as defined by the Backup Administrator, and
selected for each virtual machine, or group of virtual machines, by the VMware Administrator.
Virtual machines are auto-discovered and automatically protected based on the policies
assigned to the group where they are created.
Both image and file level recovery are supported. Since this feature support is enabled by
integration with VMware vCenter, management is virtual-centric, with information on the
VMware environment presented as VM groups and folders. File level recovery is supported for
both Windows and Linux.
2014 EMC Proven Professional Knowledge Sharing 33
Enhanced Management and Enterprise Applications Support
Figure 16
The last thing we offered to our customer is the new NetWorker plugin to have a panoramic
view of the backup environment. With no additional cost we introduced the new EMC Backup
and Recovery Manager. It is a new intuitive management interface for monitoring and reporting
for NetWorker and Avamar through a single pane of glass. While primarily used for NetWorker
and Avamar, it will also support monitoring of Data Domain Systems from the backup
administrator’s perspective. Operators and administrators can monitor alerts, activities, and
systems. It also monitors events, which are informational messages, useful for troubleshooting
and auditing. Reporting features enable customers to confirm that client systems are being
properly protected and also track system usage and capacity. It offers a dashboard approach
providing all key information on a single screen, including alerts and warnings. Other key
usability features include filters, grouping, search, and color-coded tracking for system capacity.
New core NetWorker features focused on management simplicity and usability including an
integrated, wizard-based recovery graphical user interface available directly from the NetWorker
Management Console. This GUI will walk the Administrator through every step of the recovery
process, including recovery of snapshots, file systems, and the new Block Based Backups. It
enables a recovery operation to be scheduled and can also perform multiple recovery
operations at once.
2014 EMC Proven Professional Knowledge Sharing 34
The NetWorker server DR process has been simplified and replaces the current manual multi-
step process in the event a NetWorker server goes down. Features include self-awareness such
that if the bootstrap server ID is unknown, the system will initiate a scanner process. Now, the
Backup Administrator is stepped through the process of recovery, without having to pull out a
complicated manual to follow.
The command line wizard program automates the recovery of the NetWorker server’s media
database, resource files, and client file indexes. The administrator can choose to recover just
the media database, the resource files, the client file indexes – or all of the above.
Appendix
1- http://nsrd.moab.be/2013/07/12/networker-8-1-countdown-2/
2- Why EMC DD series doc number h11755
3- V to the MAX by John Bowling- knowledge sharing article 2012
2014 EMC Proven Professional Knowledge Sharing 35
Biography
Mohamed Sohail
Mohamed has over 9 years of IT experience in operations, implementation, and support; 4 of
them with EMC. Mohamed previously worked as a Technical Support Engineer at Oracle Egypt
and was a Technical Trainer at Microsoft. Mohamed holds a Bsc. in Computer Science from
Rome University Sapienza Italy and a B.A in Italian from Ain Shams University Egypt.
Mohamed holds EMC Proven Professional Backup Recovery certification.
Shareef Bassiouny
A Backup Recovery NetWorker Specialist, Shareef is a Technical Support engineer in the GTS
Organization at EMC. Shareef has over 12 years of experience in IT operations,
implementation, and support; more than 3 of those spent with EMC NetWorker support. Shareef
holds a Bsc. in Telecommunication Engineering from Cairo University. His previous role was
leading a Dedicated IT Customer Support Desk that handled Data Center Operation and
Change Management at Orange Business Services.
Giovanni Gobbo
With 20 years of experience in the IT field ranging from Microsoft, Linux, Unix, VMware,
Storage, and Backup environments, Giovanni has solid hands-on experience implementing,
managing, and planning physical and virtualized computer infrastructure.
Giovanni has worked for Atlantica, Terasystem, Getronics, and Olivetti.
EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION
MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO
THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.