1
Business Continuity & Disaster Recovery - Protecting Your Customers' Mission Critical Environment
Sin Cheong Wong Senior Systems Consultant VMware ASEAN
2
Disasters Happen. Do You Need Protection?
43% of companies experiencing disasters never re-open, and 29% close within two years (McGladrey and Pullen)
93% of business that lost their data center for 10 days went bankrupt within one year (National Archives & Records Administration)
40% of all companies that experience a major disaster will go out of business if they cannot gain access to their data within 24 hours (Gartner)
Top executives say 10 hours to recovery; IT managers say up to 30 hours (Harris Interactive)
3
Business-Critical Applications Require Business Continuity
Availability Expectations on vSphere Continue to Increase RTO’s decreasing from >24 hours to <12 hours
38%
43%
53%
25% 25%
18%
% of Application Instances Running on VMware in Customer Base
MS Exchange
MS SQL
MS SharePoint
Oracle Middleware
Oracle DB
SAP
Source: VMware customer survey, Jan 2010 and April 2011 interim results, Data: Total number of instances of that workload deployed in your organization and the percentage of those instances that are virtualized
2010
2011
42%
47%
67%
34% 28% 28%
4
Tradeoffs Of Traditional Business Continuity Solutions
Middleware / Java
Oracle RAC
Oracle DataGuard DB Mirroring
MS Clustering
DB Access Groups
CCR / SCR
App Server Cluster
Session State Replication
Backup Data replication
Application-level availability silos: Complex and expensive
Data protection services: Longer RTOs and RPOs
5
Challenges of Traditional Disaster Recovery
Expensive/ Dependencies
Complex Recovery Plans
? ?
?
? ? ?
?
?
Unreliable Failovers
Apps
Hosts
Storage
Network
Software
Hosts
Storage
Facilities
>$10K per app
Failure to meet business requirements • Long RTOs – days to weeks • Too much time and resources consumed
6
So How To Overcome This?
7
vSphere Provides The Best Foundation For Disaster Recovery
Flexible Infrastructure • Eliminate need for identical hardware across
sites • Enable waterfalling of equipment to recovery site
Simple Application Protection • Entire system – including application, OS,
and data – is stored as virtual machine files • Entire system can be protected with data
protection tools
Cost-Efficient Infrastructure • Reduced hardware requirements at recovery site • Use recovery hardware to run low-priority apps
Encapsulation
Consolidation
Hardware Independence
vSphere
vSphere vSphere
Automation is needed to lower risk, increase confidence
8
vCenter Site Recovery Manager Ensures Simple, Reliable DR
Provide cost-efficient replication of applications to failover site • Built-in vSphere Replication • Broad support for storage-based replication
Simplify management of recovery and migration plans • Replace manual runbooks with centralized
recovery plans • From weeks to minutes to set up new plan
Automate failover and migration processes for reliable recovery • Enable frequent non-disruptive testing • Ensure fast, automated failover • Automate failback processes
Site Recovery Manager Complements vSphere to provide the simplest and most reliable disaster protection and site migration for all applications
VMware vSphere
VMware vCenter Server
Site Recovery Manager
VMware vCenter Server
Site Recovery Manager
VMware vSphere
Site A (Primary) Site B (Recovery)
Servers Servers
9
Simple Setup And Management of Recovery And Migration Plans
§ Weeks or months to set up
§ Error-prone
§ Quickly falls out of sync with apps and infrastructure changes
§ Simple recovery plan set up in minutes
§ Fewer steps means far less room for errors
§ Simple to keep in sync with changes
…to Simple Recovery Plans From Complex Runbooks…
10
Risk With Infrequent DR Plan Testing
Unproven Recoverability
Time DR Test DR Test
Changes to Applications &
Infrastructure Configuration
TESTING GAP
Recovery Risk
IT Environment without Virtualization & DR Automation
Infrequent DR testing = high risk, low confidence
11
Frequent DR Testing Reduces Risk
SRM facilitates frequent testing of recovery plans
Virtualization & DR automation reduces recovery risk
Recovery Risk
DR Test
Frequent DR Testing
Time
Virtualization + DR Automation
DR Test
12
Non-Disruptive DR Testing
SRM provides non-disruptive testing of disaster recovery plans
Production Site Recovery Site
Copy of Production
Replication
Suspended Test/Dev VMs
Isolated test network at recovery site
13
DR Coverage Often Limited Due To High Protection Costs
Tier 1 Apps - Protected
Tier 2 / 3 Apps – Backup only
Corporate Datacenter
Small Sites – Backup only
Small Business Remote Office / Branch Office
Need to expand DR protection • Tier 2 / 3 applications in larger
datacenters • Small and medium businesses • Remote office / branch offices
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
14
vSphere Replication For Cost-Efficient, Simple Replication
Reduce storage costs by 2X
• Support for heterogeneous storage across sites, including non-replicating storage
• Use lower-end or older storage at failover site
Eliminate replication software costs
• vSphere Replication included with Site Recovery Manager at no additional cost
Manage replication directly from vCenter
• Eliminate complex interactions with storage teams
Manage replication at the individual VM level
• Eliminate need for complicated VM-to-LUN mapping
15 minute RPOs
• Set RPOs between 15 minutes and 24 hours
Efficient network utilization • Replicate only changed disk
areas
Highly scalable • 500 virtual machines
Limitations • No automated failback • File-level consistency only
(except planned migration) • No FT, templates, linked
clones, physical RDMs
Cost-efficient Simple Powerful
15
SRM Provides Broad Choice of Replication Options
vSphere Replication Simple, cost-efficient replication for Tier 2 applications and smaller sites
vCenter Server Site
Recovery Manager
vSphere
vCenter Server Site
Recovery Manager
vSphere vSphere Replication
Storage-based replication
Site A (Primary) Site B (Recovery)
Storage-based Replication High-performance replication for business-critical applications in larger sites
16
Automate DR Failover Processes
Overview
Benefits
Automatically detect site failures § Require user to manually initiate failover
Automate recovery process § Stop replication and present replicated LUNs to
vSphere § Execute user-defined recovery plan
Ensure fast and predictable failovers and migrations § Consistently meet business requirements
Minimize risk of user errors
Site B Site A
Replication
1 Raise alert when hearbeat lost
2 User initiates failover
3
Stop replication and present LUNs to vSphere
4 Recover VMs
DR Failover
vSphere vSphere
17
Planned Migrations For App Consistency & No Data Loss
Overview
Benefits
Two workflows can be applied to recovery plans: § DR failover § Planned migration
Planned migration ensures application consistency and no data-loss during migration § Graceful shutdown of production VMs in
application consistent state § Data sync to complete replication of VMs § Recover fully replicated VMs
Better support for planned migrations § No loss of data during migration process § Recover ‘application-consistent’ VMs at
recovery site
Planned Migration
Site B Site A
Replication
1 Shut down production VMs
2 Sync data, stop replication and present LUNs to vSphere
3 Recover app-consistent VMs
vSphere vSphere
18
Simplify failback process § Automate replication management § Eliminate need to set up new recovery plan
Streamline frequent bi-directional migarations
Automated Failback To Streamline Bi-Directional Migrations
Re-protect VMs from Site B to Site A § Reverse replication § Apply reverse resource mapping
Automate failover from Site B to Site A § Reverse original recovery plan
Restrictions § Does not apply if Site A has undergone major
changes / been rebuilt § Not available with vSphere Replication
Overview
Benefits
Automated Failback
Site B Site A
Reverse Replication
Reverse original recovery plan
vSphere vSphere
19
History Reports – For Auditing
20
History Reports - For Auditing
21
Beyond DR: Disaster Avoidance And Planned Migrations
Recover from unexpected site failure
• Full or partial site failure
The most critical but least frequent use-case
• Unexpected site failures do not happen often
• When they do, fast recovery is critical to the business
Anticipate potential datacenter outages
• For example: in case of planned hurricane, floods, forced evacuation, etc.
Initiate preventive failover for smooth migration
• Leverage SRM ‘planned migration’ to ensure no data-loss
• ‘Automated failback’ enables easy return to original site
Most frequent SRM use case
• Planned datacenter maintenance
• Global load balancing
Streamline routine migrations across sites
• Test to minimize risk • Execute partial failovers • Leverage SRM ‘planned
migration’ to ensure no data-loss
• ‘Automated failback’ enables bi-directional migrations
Disaster Failover Disaster Avoidance Planned Migration
3 typical use-cases for SRM
22
SRM Provides Broad Application Coverage
Continuous
Hours
Days
App-level geo-clustering / load balancing
RTO
RTO: 30 minutes to hours RPO: Flexible based on storage replication
RPO Synchronous Hours Days
Site Recovery Manager
Tier 1 Apps
Tier 2 Apps
Tier 3 Apps
23
SRM Supports Flexible Topologies
Active-Passive Failover
Active-Active Failover
Bi-directional Failover
Shared Recovery Sites
Production
Recovery
Production
Recovery
Production
Production
• Most common traditional scenario
• Expensive dedicated resources
• Leverage recovery infrastructure for test, development, training
• Utilize sunk cost of recovery site
• Production applications at both sites
• Each site acts as the recovery site for the other
• Many-to-one failover • Particularly useful for
Remote Office / Branch Office
24
Points to consider first
§ Distinguish between Service Disruption and Disaster – What is and What’s Not
§ Availability <not equal> to Disaster Recovery
§ Disaster Recovery Procedures (DRP) and Business Continuity Procedures – Understand the Differences
§ Important point to note about DRP and BCP solution.
25
Important point to note about DRP and BCP solution
§ No single product provides disaster recovery or business continuity
§ Companies are dynamic § New systems and applications are brought on-line § Old systems and applications are retired
§ DRPs and BCPs must be constantly updated to match the current operations reality.
§ Disaster recovery and business continuity are not products § No one product can give a company “instant disaster recovery protection” or
“instant business continuity planning”. § VMware Site Recovery Manager (SRM) is a product that helps companies
quickly restores an organization’s IT infrastructure with automation. § SRM must be combined with other products and technologies and with a
effective disaster recovery planning and effective business continuity planning.
26
Key Components Of SRM 5
vCenter Server Site Recovery Manager
Protected Site Recovery Site
Storage
vCenter Server Site Recovery Manager
vSphere vSphere
Storage
Replication Options
vSphere Replication • Bundled with SRM
Storage-Based Replication (3rd party)
Site Recovery Manager 5 • 1 per site
vCenter Server 5 • 1 per site • Standard or Foundation
vSphere 3.5, 4.x or 5 • Standard, Enterprise or Enterprise Plus
27
SRM Architecture with vSphere Replication (VR)
“Protected” Site “Recovery” Site
Storage Storage
vSphere Client vSphere Client
VMFS VMFS
Storage VMFS VMFS
SRM Server
VRMS
vCenter Server
VR Server
SRM Plug-in
ESX
SRM Plug-in
SRM Server vCenter Server
ESX ESX ESX ESX
VRMS
VRA VRA VRA VRA VRA
28
Sales Tools > Competitive Selling
“Competitive Selling” on Partner Central
§ Competitive Toolkits
§ Competitive Sales Tools • Quick Positioning
Cards • Competitive Flashes • Presentations • White papers • Analyst reports
§ Competitor Pages
29
Summary and Next Steps
§ vCenter Site Recovery Manager • Product Page – www.vmware.com/products/srm • Overview, datasheet, webinars, docs, community links • Free 60-day Evaluation – all you need to get started!
§ SRM 5 Evaluator Guide • http://www.vmware.com/files/pdf/products/SRM/VMware-vCenter-Site-
Recovery-Manager-Evaluation-Guide.pdf
§ VMware Interoperability Matrix • http://partnerweb.vmware.com/comp_guide/sim/interop_matrix.php
§ Business Continuity • Solutions from VMware – www.vmware.com/solutions/continuity
30
Thank You