Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
© 2014 VMware Inc. All rights reserved.
Implementing a Holistic BC/DR Strategy with VMware
Roberto BarberoSolution ArchitectVMware vForum, 2014
What’s on the agenda?
• Defining the problem
• Definitions
• VMware technologies that provide BC and DR
– vSphere HA and App HA
– vSphere FT
– vSphere Data Protection / Advanced
– vCenter Availability
– vSphere Replication
– vCenter Site Recovery Manager (SRM)
– vCenter Infrastructure Navigator (VIN)
• Find out more
IT Business Continuity
Is It a Real Problem?
Is It a Real Problem?
Is It a Real Problem?
UK Bank group
An outage in June 2012 affected millions of
customers from receiving or making
payments and lasted for almost an entire
week.
£125 Million
What’s the Difference?
Disaster
Avoidance
Disaster
Recovery
Planned vs.
Unplanned
Disaster Recovery vs. Business Continuity
Example: Tuesday, August 23, 2011 at 1:51 PM EDT - Magnitude 5.8 earthquake near Mineral, Virginia
Disaster recovery required?
No
Interruption to business continuance?
YES!
Fault Tolerance vs. High Availability
• Fault tolerance
– Ability to recover from component loss
– Example: Hard drive failure
• High availability
Uptime percentage in one year Downtime in one year
99 3.65 days
99.9 8.76 hours
99.99 52 minutes
99.999 “five nines” 5 minutes
X
RTO, RPO, and MTD
• Recovery Time Objective (RTO)
– How long it should take to recover
• Recovery Point Objective (RPO)
– Amount of data loss that can be incurred
• Maximum Tolerable Downtime (MTD)
– Downtime that can occur before significant loss is incurred
– Examples: Financial, reputation
Making an Application Service Highly Available
• vSphere HA
• NEW: vSphere App HA
VMware vFabric™ tc Server
vSphere App HANew
Policy-based
Protect off-the-shelf apps
vSphere App HA
vSphere HA Cluster
vFabric Hyperic
Virtual Appliance
vSphere App HA
Virtual Appliance
Hyperic Agents
Running in VMs
vCenter
Server
vSphere vSphere vSphere vSphere
New
vSphere App HANew
vSphere HA – Keep In Mind…
• RTO – measured in minutes (not seconds)
• Requires shared storage
• Best practices
– Use admission control – percentage policy
– Test post-failure performance with host maintenance mode
– Isolation response – leave powered on
– Network and storage redundancy
vSphere Fault Tolerance (FT)
• Zero recovery time, data loss
– Host hardware failure only
– Does not protect against OS and application failure
• Works fine with HA, App HA
• Why not FT?
– Resource requirements – does workload really need it?
– VM has multiple CPUs
– No VM snapshots – backups require agent
Data Protection (Backup and Restore)
• Agents? No Agents? – Both!
– No agents for majority of workloads – keep it simple
– Agents for certain apps
• vSphere Data Protection (VDP) Advanced
– Backup and recovery for VMware, from VMware
– Based on proven, mature EMC Avamar™
– Agent-less VM backup and restore
– Agents for granular tier-1 application protection
vSphere Data ProtectionNew
VDP Advanced – Keep In Mind…
• Engineered for SMB environments
• Uses VADP – VM snapshots, CBT
• Utilizes Windows VSS in VMware Tools
• Works fine with HA, not with FT
• RDM – virtual yes, physical no
• Is it DR?
– Maybe – depends on RTO, RPO
– Needs replication offsite, right?
VDP Advanced – Keep In Mind…
• Best Practices
– Prepopulate DNS, always use FQDN
– Manage VM snapshots
– Avoid deploying to slow storage
– Do not power-off, always shut down gracefully
– Do not schedule backups during maintenance window
vCenter Availability
• Run vCenter Server application in a VM
• Run vCenter Server database in a VM
• Run both in same VM?
• Protect with vSphere HA
– vCenter and DB VM restart priority set to High
– Enable guest OS and App monitoring
• App HA can protect SQL Server database
vCenter Availability
• Back up vCenter Server VM and database
– Image-level backup for vCenter Server VM
– App-level backup using agent for database backup
• Why not FT for vCenter Server?
– vCenter Server requires minimum of 2 vCPUs
– FT does not protect against application failure
• Replicate vCenter Server, database VMs?
vSphere Replication – DR
• Native tool built into the platform
• Per-VM hypervisor replication, managed in VC
Selectable RPO from 15 min up
to 24 hours
Selectable destination
datastore (Disk-type agnostic)
Replication Across Sites
vCenter Server
ESXi
NFC
VRA
ESXi
NFC
VRA
ESXi
NFC
VRA
StorageStorage
(VMDK1)
vCenter Server
ESXi
NFC
VRA
ESXi
NFC
VRA
ESXi
NFC
VRA
VR
ApplianceVR
Appliance
StorageStorage
VMDK1
vCenter Server vCenter Server
Four Steps for Full Recovery
Right-click, select “Recover”
Select a target folder
Select a target resource
Click Finish
Will validate your choices as you go
New Feature – Retain Historical Replicas
vSphere
VR Agent
After recovery, use the snapshot manager to revert
to earlier points
Retention of
multiple points
in time allows
reversion to
earlier known
good states
MPIT Presented as VM Snapshots after Failover
Use the snapshot manager to revert to earlier points, an interface
all administrators have been comfortable with for many years.
vSphere Replication – Interoperability
Fault tolerance –
Doesn’t work with VR
• FT conflicts at the
vSCSI disk filter level.
VDP
• Mostly no problem!
• If using VSS… ensure
you are using 5.5!!
HA, vMotion, DRS
Storage vMotion
and Storage DRS
• Now supported!
SRM
• A Disaster Recovery engine
• A tool that uses externally replicated data (VR or array based) to speed the RTO of a BCP
• A product that allows for DR to be tested, automated, planned, repeatable and customizable
What is it?
• A replication engine
• A tool for systems that need near-instant RPO
• A disaster avoidance stretched cluster
What is it not?
Key Components of SRM
Replication
vCenter Server
SRM Server
One vCenter Server
(Windows or VCVA) per
site, same versions
One SRM Server per
site, same versions
vSphere hosts,
recommend same
versions per site (pre
vSphere 5.x only if using
array replication)
vSphere Essentials Plus and higher editions supported
vCenter Server
SRM Replication Options
• SRM can utilize BOTH array based
AND vSphere Replication
• SRM will “see” existing standalone
vSphere
Replication protected VMs
• SRM can install vSphere
Replication from scratch
if needed
HubLUN 2
Web
Multi-tier App
DB
App
vSphere Replication
Storage-based Replication
LUN 1
Web
DB
App
Multi-tier App
Recovery Workflows
• User defined recovery plan
• Minimize errors
Failover Automation
• Isolated test environment
• Increase confidence in DR process
Non-disruptive Failover Testing
• Zero data loss
• Operational migration
Planned Migration
• Re-protect VM’s, migrate back
Failback Automation
SRM Interoperability
• Works with VR –and- ABR
• Backups, VADP or other
are fine
• HA is no problem at all
• vMotion and DRS are fine
• Storage vMotion and
Storage DRS – Sort of…
– Replication Dependent
• FT is “yellow”
– Array replicated only and the
FT status is not recovered
• Web vs vSphere Client
SRM – A Few Best Practices
Big ones:
Storage Layout
Test Network Configuration
Test often!
Size vCenter correctly
Biggest one:
Do a Business Impact Analysis
RPO, RTO, Cost of downtime, interdependencies, criticality of applications, priorities, units of failover, overlooked externalities, executive buy-in, …..
Protection Groups (PGs)
• More PGs = more granular testing/failover
– DR testing is easier – fewer resource requirements
– Fail-over only what is needed
– More configuration/complexity
• Less protection groups = less complex
– Fewer LUNs, PGs, recovery plans
– Less flexibility
Fewer LUNs/PGs
Less complexity
Less flexibility
More LUNs/PGs
More complexity
More flexibility
Majority of outages
are partial (not entire
data center) – design
accordingly
Test Network
– Use VLAN or isolated network for test environment
• Default “Auto” setting does not allow VM communication between hosts
– Different vSwitch can be specified in SRM for test versus run
• Specified in Recovery Plan
vSphere Infrastructure Navigator
VMware – Multiple Levels of Protection
SQL
vSphere HA/FT
Site A
VMware – Multiple Levels of Protection
SQL
vSphere HA/FT
VDPA
Site A
VMware – Multiple Levels of Protection
SQL
vSphere HA/FT
VR/SRMSQL
VDPA
Site A Site B
Find Out More
• Take an online hands on lab
• Ask for a demo
• Install 60-day evaluation
http://labs.hol.vmware.com/
Thank You