Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Data Center Business ContinuanceBusiness Continuanceand Disaster Recovery
Maciej BocianMaciej [email protected] Sales Manager
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 1
Data Center and Virtualization, Central Europe
CCIE#7785
Business Continuance Drivers
• Cost of application downtime, lost data
Business Continuance Drivers
Cost of application downtime, lost data and productivity
• Regulatory mandates (Homeland Hurricanesg y (Defense, Basel II, HIPAA, GLB, SEC)
Firms must recover business operations the same business day a disruption occurs“Out-of-region” data center, 200+ km away Mandates backup data centers on separate grids
The Northeast Blackout
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 2
NYC Blizzard of 2003
Business Continuance Is More Critical than Ever75% of IT decision-makers have altered Disaster Recovery/Business Continuance programs as a result of September 11result of September 11
Following a disaster 43% of directly affectedFollowing a disaster 43% of directly affected businesses do not reopen and 29% fail within 24 months as a result
Only 15% of Global 2000 enterprises have a full-fledged business continuity plan.
Disasters: fire, storm, floods, earthquakes, chemical accidents, nuclear accidents, wars
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 3
accidents, nuclear accidents, wars
Sources: Disaster Recovery Journal, Gartner Group
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 4
The Evolution of Data Centers
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 5
Data Center EvolutionData Center EvolutionNETWORKED DATA
CENTER PHASEData Center
Network
Data CenterContinuous Availability
Data Center Consolidation
Data Center Distributed
Agi
lity
Client/Server
COMPUTE EVOLUTION
OptimizationInternet Computing
1 Consolidation
Data CenterNetworking
Bus
ines
s MainframesContent
Networking
Thin Client: HTTP
1. Consolidation2. Integration3. Distributed
4. High Availability
TerminalNETWORK
EVOLUTION
TCP/IP
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 6
1960 1980 2000 2010
Terminal EVOLUTION
What is involved in a Data CenterWhat is involved in a Data Center
Application solutionLi /HP
Network infrastructure solutionLinux/HP,
Solaris/SunFire, WebLogic, J2EE custom app, etc.
Cisco GSRs, CISCO CATALYST
6500, Cisco Catalyst Cat4000
Database solutionLinux/HP, Solaris/SunFire, Oracle 10G RAC, etc.
Layer 4–7 services solutionCSM, SSLM, CSS,
CE, GSS 10G RAC, etc.
St l ti
Network security solutionPIX®,
FWSM, IDSM, Storage solution
MDS9000
Management and instrumentation solution
IDSM, VPNSM,
CSA
Terminal NAM
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 7
servers, NAM,Cisco Works LMS/VMS,
HSE
What is Distributed Data CenterWhat is Distributed Data Center
APP A APP B APP A APP C
Data Replication
Primary SecondaryFC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 8
yData Center
yData Center
Why Distributed Data CentersWhy Distributed Data Centers
Provide disaster recovery and business continuance
Avoid single, concentrated data depositary
High availability of applications and data access g y pp
Load balancing together with performance scalability
Better response and optimal content routing: proximityBetter response and optimal content routing: proximityto clients
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 9
Front-end IP Access Layer y
“Content Routing”site selectionAPP A APP B APP A APP C
Primary SecondaryFC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 10
yData Center
yData Center
Application and Database Layerpp y
“Content Switching”
APP A APP B APP A APP C
Content SwitchingLoad Balancing
“Server Clustering”High AvailabilityHigh Availability
PrimaryData Center
SecondaryData Center
FC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 11
Data Center Data Center
Backend SAN Extension
APP A APP B APP A APP C“Storage” & “Optical”
DataMirroring and Replicationo g a d ep cat o
P i S d
FC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 12
PrimaryData Center
SecondaryData Center
Data Center Disaster Recovery
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 13
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectivesFailure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 14
Disaster RecoveryDisaster Recovery
Recovery of data and resumption of service - EnsuringRecovery of data and resumption of service Ensuring business can recover and continue after failure or disaster
Ability of a business to adapt, change and continue when confronted with various outside impacts
Mitigating the impact of a disaster
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 15
What It means For Business
Business ResilienceBusiness ResilienceContinued Operation ofBusiness During a Failure
Business ContinuanceRestoration of Business
After a FailureDisaster Recovery
Protecting Data Through Offsite
After a Failure
g gData Replication
and Backup
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 16
Zero Down Time is the ultimate goal
Disaster Recovery PlanningDisaster Recovery Planning
• Business Impact Analysis (BIA)Business Impact Analysis (BIA) Determines the impacts of various disasters to specific business functions and company assets
• Risk Analysis Identifies important functions and assets that are critical to company’s operationscompany s operations
• Disaster Recovery Plan (DRP) Restores operability of the target systems applications orRestores operability of the target systems, applications, or computing facility at the secondary Data Center after the disaster
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 17
Disaster Recovery ObjectivesDisaster Recovery Objectives
Recovery Point Objective (RPO)Th i t i ti ( i t th t ) i hi h t d d tThe point in time (prior to the outage) in which system and data
must be restored toTolerable lost of data in event of disaster or failureThe impact of data loss and the cost associated with the loss
Recovery Time Objective (RTO)The period of time after an outage in which the systems and dataThe period of time after an outage in which the systems and data
must be restored to the predetermined RPO The maximum tolerable outage time
R A Obj ti (RAO)Recovery Access Objective (RAO)Time required to reconnect user to the recovered application,
regardless where it is recovered
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 18
Recovery Point/Time vs. CostRecovery Point/Time vs. CostDisasterstrikes
Systems recoveredand operational
Critical data is recovered
time
Recovery timeRecovery point
time t1 time t2
Recovery time
secs mins hours days weeks
Recovery point
secsminshoursdays
time t0
ExtendedCluster
ManualMigration
TapeRestore
SynchronousReplication
AsynchronousReplication
PeriodicReplication
Tapebackup
Smaller RPO/RTO Larger RPO/RTO
$$$ Increasing cost$$$ Increasing cost
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 19
Smaller RPO/RTO Higher $$$, Replication, Hot
standby
Larger RPO/RTO Lower $$$, Tape backup/restore,
Cold stanby
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure ScenariosDesign Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 20
Failure ScenariosFailure Scenarios
Disaster could mean many types of FailureDisaster could mean many types of Failure
Network Failure
D i F ilDevice Failure
Storage Failure
Site Failure
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 21
Network FailuresNetwork FailuresInternet
ServiceP id A
ServiceProvider BProvider A Provider B
ISP failureDual ISP connectionsMultiple ISP
Connection failure within the networknetwork
ether-channelMultiple route paths
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 22
Device FailuresDevice FailuresInternet
ServiceProvider A
ServiceProvider BProvider A
Routers, Switches, FWsHSRPVRRP
HostsHA clusterHA cluster
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 23
Storage FailuresStorage FailuresInternet
ServiceP id A
ServiceProvider BProvider A Provider B
Disk arraysRAID
Disk Controllers
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 24
Site FailuresSite FailuresInternet
ServiceP id A
ServiceProvider BProvider A Provider B
Partial Site FailureApplication maintenanceppApplication migrationApplication scheduled DRexercise
Complete Site FailureDisaster
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 25
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 26
Cold StandbyCold Standby
One or more data center with appropriately configured space equipped with pre-qualified environmental, electrical, and communication conditioning, g
Hardware and Software installation, Network access, and data restoration all need manual intervention
Least expensive to implement and maintain
Substantial delay from standby to full operationy y p
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 27
Disaster Recovery – Active/StandbyDisaster Recovery Active/Standby
APP A APP B APP A APP B
Primary SecondaryFC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 28
yData Center Data Center
(Cold Standby)
Warm StandbyWarm Standby
A data center that is partially equipped with hardware and communications interfaces capable of providing backup operating support. p g pp
Latest backups from the production data center must be delivered
Network access needs to be activated
Provides better RTO and RPO than Cold Standby yBackup
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 29
Disaster Recovery – Active/StandbyDisaster Recovery Active/Standby
APP A APP B APP A APP B
IP/Optical Network
Primary SecondaryData Center
FC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 30
yData Center Data Center
(Warm Standby)
Hot StandbyHot Standby
A data center that is environmentally ready and hasA data center that is environmentally ready and has sufficient hardware, software to provide data processing service with little down or no down time.
Hot Backup offers Disaster Recovery, with little or no human intervention
A li ti d t i li t d f th i itApplication data is replicated from the primary site
A hot backup site provides very good RTO and RPO
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 31
Disaster Recovery – Active/StandbyDisaster Recovery Active/Standby
APP A APP B APP A APP C
IP/Optical Network
Primary SecondaryFC FC
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 32
yData Center
yData Center
Disaster Recovery – Active/ActiveDisaster Recovery Active/Active
What Does Active/Active Mean??
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 33
Multiple Tiers of ApplicationMultiple Tiers of ApplicationInternet
ServiceP id A
ServiceProvider BProvider A Provider B
Presentation TierPresentation Tier
Application TierApplication TierApplication TierApplication Tier
Storage TierStorage Tier
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 34
Active/Active Data Centers
Internal
Active/Active Data Centers
InternetInternalNetwork
Network InternetService
Provider AService
Provider B
Active/Active Web Hosting
Active/Active Application Processing
Active/Standby
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 35
Database ProcessingOr
Active/Active
Disaster Recovery yComponents
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 36
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 37
Site Selection MechanismsSite Selection MechanismsSite selection mechanisms depend on the technology or mix of technologies adopted for request routing:or mix of technologies adopted for request routing:1. HTTP Redirect
2 DNS Based2. DNS Based
3. L3 Routing with Route Health Injection (RHI)
H lth f d/ li ti d t bHealth of servers and/or applications needs to be taken into account
Optionally other metrics (like load ) can be measuredOptionally, other metrics (like load ) can be measured and utilized for a better selection
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 38
HTTP Redirection – The IdeaHTTP Redirection The Idea
Leveraging the HTTP redirect function:Leveraging the HTTP redirect function:HTTP return code 302
Proper site selection made after the initial DNS requestProper site selection made after the initial DNS request has been resolved, via redirection
Mainly as a method of providing site persistence while providing local server farm failure recovery
Can be used with the “Location Cookie” feature of the CSS to provide redirection after wrong site selectionCSS to provide redirection after wrong site selection
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 39
HTTP Redirection – Traffic FlowHTTP Redirection Traffic Flow
http://www1.cisco.com/
http://www.cisco.com/
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 40
http://www2.cisco.com/
Advantages of the HTTP Redirection ApproachApproach
Can be implemented without any other GSLB devices or mechanisms
Inherent persistence to the selected location
Can be used in conjunction with other methods to provide more sophisticated site selectionsite selection
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 41
Limitations of the HTTP Redirection Approach
It is protocol specific – relies on HTTP
Requires redirection to fully qualified q y qadditional names – additional DNS records
U b k k ifi l iUsers may bookmark a specific location – losing automatic failover
HTTPS redirect requires full SSL handHTTPS redirect requires full SSL hand shake to be completed first
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 42
DNS-Based Site Selection – The IdeaDNS Based Site Selection The Idea
The client D-proxy (local name server) performs iterative queriesThe device which acts as “site selector” is the authoritative name server for the domain(s) distributedauthoritative name server for the domain(s) distributed in multiple locationsThe “site selector” sends keepalives to servers or
l d b l i th l l d t l tiserver load balancer in the local and remote locationsThe “site selector” selects a site for the name resolution, according to the pre-defined answers andresolution, according to the pre defined answers and site load balance methodThe user traffic is sent to the selected location
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 43
DNS-Based Site Selection – Traffic FlowDNS Based Site Selection Traffic Flow
DNS Proxy
Root Name Server for/Authoritative Name Server for .com
2
Authoritative Name Servercisco.com
1
23 4
56
Client Authoritative
1 6
78
9
10
Client
http://www.cisco.com/Name Server
www.cisco.comUDP:53
TCP 80TCP:80
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 44
Data Center 1 Data Center 2
Advantages of the DNS ApproachAdvantages of the DNS Approach
Protocol independent: works with any p yapplication that uses name resolution
Minimal configuration changes in the current IP and DNS infrastructure (DNS authoritative (server)
Implementation can be different for specific host nameshost names
A-records can be changed on the fly
Can take load or data center size into account
Can provide proximity
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 45
Limitations of the DNS-Based ApproachLimitations of the DNS Based Approach
Visibility limited to the D-proxy (not theVisibility limited to the D proxy (not the client)
Can not guarantee 100% session gpersistency
DNS caching in the D-proxy
DNS caching in the client application
Order of multiple A-record answers can be altered by D-proxies
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 46
Route Health Injection – The IdeaRoute Health Injection The Idea
Server and application health monitoring provided byServer and application health monitoring provided by local Server Load Balancers
SLB can advertise or with draw VIP address to upstream routing devices depending on the availability of the local server farm
S VIP dd b d ti d f lti lSame VIP addresses can be advertised from multiple data centers – IP Anycast
Relying on L3 routing protocols for route propagatingRelying on L3 routing protocols for route propagatingand content request routing
Disaster Recovery provided by network convergence
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 47
y p y g
Route Health Injection – ImplementationRoute Health Injection Implementation
Client BClient A Router 13Router 11
Router 13
Router 10
Router 12
Location AVery High CostVery High Cost
Low CostLow Cost
Location BPreferred Location for
VIP x.y.w.z
Location ABackup Location for
VIP x.y.w.z
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 48
Advantages of the RHI ApproachAdvantages of the RHI Approach
Supports legacy application and does notSupports legacy application and does not rely on a DNS infrastructure
Very good re-convergence time, y g gespecially in Intranets where L3 protocols can be fine tuned appropriately
P t l i d d t k ithProtocol-independent: works with any application
Robust protocols and proven featuresRobust protocols and proven features
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 49
Limitations of the RHI ApproachLimitations of the RHI Approach
Relies on host routes (32 bits) whichRelies on host routes (32 bits), which cannot be propagated all over the internet (more on this later)
Requires tight integration between the application-aware devices and the L3 routersrouters
Inability to intelligently load balance among the data centers
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 50
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 51
Cluster OverviewA cluster is two or more servers configured to appear as one Two types of clustering: Load balancing (LB) and High Availability (HA) Web Servers
Clustering provides benefits for availability, reliability, scalability, and manageabilityLB l t i lti l i f Application ServersLB clustering: multiple copies of the same application against the same data set, usually read only HA clustering: multiple copies of
Application Servers
HA clustering: multiple copies of long running application that requires access to a common data depository, usually read and write
Database Servers
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 52
HA Cluster ConnectionsHA Cluster ConnectionsPublic Network (typically Ethernet) for client /Application Ethernet) for client /Application requests
Servers with same hardware, OS, and application software
Private Network (typically Ethernet) for interconnection between nodes. Could be direct
t ti ll iconnect, or optionally going through the public network
Storage Disk (typically Fiber) shared storage array NAS orshared storage array, NAS or SAN
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 53
Typical HA Cluster ComponentsTypical HA Cluster Components
Application software that are clustered to provide High pp p gAvailability. Example: Microsoft Exchange, SQL, Oracle database, File and Print Services Operating System that runs on the server hardware. E l Mi ft Wi d 2000 2003 Li ( d thExample: Microsoft Windows 2000 or 2003, Linux (and the other flavors of UNIX), IBM VMS or z/OS (for mainframe)Cluster Software that provides the HA clustering service for the application Example: Microsoft MSCS EMCfor the application. Example: Microsoft MSCS, EMC AutoStart (Legato), Veritas Cluster Server, HP TruCluster and OpenVMS Optionally Cluster Enabler a software that synchronizesOptionally, Cluster Enabler, a software that synchronizes the cluster software with the storage disk array software
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 54
Basic HA Cluster DesignBasic HA Cluster Design
Active/Standby:– Active node takes client requests and writing to the data– Standby takes over when detecting failure on active– Two-node or multi-node
Active/Active: node1 node2
– Database requests load balanced to both nodes– Lock mechanism ensures data integrity– Most scalable design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 55
File System Approaches for HA ClustersFile System Approaches for HA Clusters
Shared Everythingy g– Equal access to all storage– Each node mounts all storage resources– Provides a single layout reference system for all nodesProvides a single layout reference system for all nodes– Changes updated in the layout reference
Shared Nothing– Traditional file system with peer-peer communication– Each node mounts only its “semi-private” storage– Data stored on the peer system’s storage is accessed via the peer-p y g ppeer communication– Failed node’s storage needs to be mounted by the peer
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 56
Geo-clustersGeo clusters
Geo-cluster: cluster that span multiple data centers
Local Remote
WAN
LocalDatacenter
RemoteDatacenter
node1 node2
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 57
Disk Replication
Synchronous or Asynchronous
2 x RTT
Considerations for HA ClustersConsiderations for HA Clusters
Split Brain: Cluster partitioning when nodes can not communicate withSplit Brain: Cluster partitioning when nodes can not communicate with each other but are equally capable of forming a cluster and mount disks.
Extended L2 required in most implementations for:Public Network since client only knows about the Virtual IP address– Public Network, since client only knows about the Virtual IP address
– Private Network, used for Heart-beats
Storage:– Directly Attached Disk (DAS) cannot be used– Shared Disk needs to be visible to both Nodes– Needs to interface with cluster software for disk failover, zoning, LUN masking when there is a node failure
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 58
Split-BrainSplit Brain
Split-brain happens when all of theSplit-brain happens when all of the network communication links between two or more cluster nodes fail.
Both nodes could potentially go active, and concurrently access the disk, thus corrupting data
node1 node2
d s , t us co upt g data
Data Corruption
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 59
Data Corruption
Resolution for Split Brain: QuorumResolution for Split Brain: Quorum
A quorum device serves as a tie qbreaker to arbitrate which system has access to resources.
The quorum ensures that even if there qis no communication between the nodes, only one node can continue to access the disk. node1 node2
Only the node that owns the quorum (or, majority quorum votes) can bring resources online.
Any resource can be used as the arbitrator to break the tie.
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 60
quorum
Application data
Extended Layer 2 NetworkExtended Layer 2 Network
In most implementation, L2 t k i
WANa common L2 network is needed for the heartbeat between the nodes, as well as public client
LocalDatacenter
RemoteDatacenter
accessExtending VLAN on a geographical basis is not
id d b t ti
Public Layer 2 network
node1 node2considered best practice because of the impact of broadcasts, multicast, flooding and Spanning-
Private Layer 2 network node1
g gTree integration issues
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 61
Disk Replication: Synchronous or Asynchronous
Resolution: L3 Routed SolutionResolution: L3 Routed Solution
In certain cases a L3 routed solution is possible 11 20 5 x 172.28.210.x
Microsoft MSCS – Requires that 2 nodes be on the same subnet.
Th i ti b t th 2
node1 node2
11.20.5.x
– The communication between the 2 nodes is UDP unicast– Local Area Mobility (LAM) allows the placement of the nodes on 2 different subnetsdifferent subnets
Veritas VCS– Allows having nodes with IP addresses in different subnets
Extended SAN
– The Virtual Address needs to change when moving from node1 to node2– DNS can be used to provide name-
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 62
pmultiple IP mapping Disk Replication:
Synchronous or Asynchronous
Storage Disk ZoningStorage Disk Zoning
What storage disk array node1 node2
g yshould node 2 be zoned to before and after a failure on node 1
standbyactive
To complete the failover you need to change the zoning configuration
Extended SAN
Software needed to synchronize the Cluster Software with the Disk Array’s software, i.e. Cluster Enabler
RW RD
sym1320 sym1291
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 63
RW RD
Resolution: Cluster Enabler
The Cluster Enabler (CE) provides the interface between the
node1 node2the interface between the Clustering Software and the Disk Array’s softwareWhen the Clustering Software detects a failure and wants to fail
active standby
detects a failure and wants to fail the node, the Cluster Enabler instructs the Disk Array to perform an failover Extended SAN
Cluster Enabler also allows node1 to be zoned to sym1320 and node2 to be zoned to 1291The Cluster Enabler running onThe Cluster Enabler running on each node typically communicates with the Cluster Enabler Software running on the remote node with Local Multicast messages RW WD
sym1320 sym1291
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 64
Local Multicast messages WD
RW WD
AgendaAgenda
Introduction to Data Center - The EvolutionIntroduction to Data Center The Evolution
Data Center Disaster RecoveryObjectives Failure Scenarios Design Options
Components of Disaster RecoveryComponents of Disaster RecoverySite Selection - Front End GSLBServer High Availability - ClusteringD t R li ti d S h i ti SAN E t iData Replication and Synchronization - SAN Extension
Sample Design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 65
TerminologyTerminology
Storage subsystemJust a bunch of disks (JBOD)Redundant array of independent disks (RAID)
Storage I/O devicesStorage I/O devicesHost Bus Adapter (HBA)Small Computer Serial Interface (SCSI)p ( )
Storage protocolsSCSIiSCSIFC (FCIP)
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 66
Terminology (Cont’d)Terminology (Cont d)
Direct Attached Storage (DAS)St i “l l” b hi d thStorage is “local” behind the server No storage sharing possibleCostly to scale; complex to manage
Network Attached Storage (NAS)Storage is accessed at a file level over an IP networkSt b h d b tStorage can be shared between servers
Storage Area Networks (SAN)Storage is accessed at a block-levelStorage is accessed at a block level Separation of Storage from the ServerHigh performance interconnect providing high I/O throughput
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 67
Storage for ApplicationsStorage for ApplicationsPresentation Tier
Unrelated small data files commonly stored on internal disks U yManual distribution
Application Processing Tier Transitional, unrelated data Small files residing on file systemsMay use RAID to spread data over multiple disks y p p
Storage Tier Large, permanent data files or raw dataLarge batch updates, most likely Real timeLog and data on separate volumes
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 68
Backup and ReplicationBackup and Replication
Offsite tape vaultingBackup tapes stored at offsite location
Electronic vaultingTransmission of backup data to offsite locationTransmission of backup data to offsite location
Remote disk replicationContinuous copying of data to offsite locationTransparent to host
Other methods of replicationHost-based mirroring Network-based replication
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 69
Replication: Modes of OperationReplication: Modes of Operation
SynchronousSynchronousAll data written to cache of local and remote arrays before I/O is complete and acknowledged to host
AsynchronousWrite acknowledged after write to local array cache; changes (writes) are replicated to remote array asynchronously(writes) are replicated to remote array asynchronously
Semi-synchronousWrite acknowledged with a single subsequent WRITE command g gpending from remote array
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 70
Synchronous Vs. Asynchronous Trade-Off
SynchronousImpact to Application
AsynchronousNo Application
Off
Impact to Application Performance
Distance Limited (Are Both Sites within the Same
Threat Radius)
No Application Performance Impact
Unlimited Distance (Second Site Outside Threat Radius)
Threat Radius)
No Data Loss Exposure to
Possible Data Loss
Enterprises Must Evaluate the Trade-Offs
Maximum tolerable distance ascertained byMaximum tolerable distance ascertained by assessing each application
Cost of data loss
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 71
Data Replication with DB ExampleData Replication with DB Example
Control Files identify other files making up the database and
Control Files• DB name making up the database and records content and state of the db.Datafile is only updated
DB name
• creation date
• backup performed
• redo log time period
• datafile state y pperiodicallyRedo logs record db changes resulting from transactions
U d t l b k h th t
Identify
• datafile state
Used to play back changes that may not have been written to datafile when failure occurred
Typically archived as they fill to local and DR site destinationslocal and DR site destinations
Datafiles Redo Log Files
Record changes to
• Tablespaces • Database changes
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 72
Tablespaces
• Indexes
• Data Dictionary
Database changes
Data Replication with DB Example (Cont’d)(Cont d)
Failure or disaster occurs at time t1
• Media Failure (e g disk)time
• Media Failure (e.g. disk)• Human Error (datafile deletion)
• Database Corruption
. . . . . . . . .
t0t1Archived Redo Logs Online Redo
Logs
Database restored to state at time of failure (time t1) by:
1. Restoring Control Files & Datafiles from last Hot Backup (time t0)
Hot Backup of Datafiles and
Control Files taken at Time t0
Backup (time t0)2. Sequentially replaying changes from subsequent
Redo Logs (archived and online) – changes made between time t0 and t1
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 73
Data Replication with DB Example (Cont’d)(Cont d)
Redo Logs (Cyclic)Redo Logs (Cyclic)Copy of Every Committed
Transaction Synchronously Replicated
Primary Site Secondary Site
Earlier DBfor Zero Loss
Database
Earlier DB Backups
SAN E t i
Replicated/Copied
Point in Time Copy Taken
When DB Quiescent
Database copy at time t0
Database Copy at Time t0
Extension Transport
Archive LogsReplicated/Copied
Quiescent
Archive Logs
Mixture of sync and async replication technologies commonly usedUsually only redo logs sync replicated to remote siteArchive logs created from redo log and copied when redo log switches
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 74
g g p gPoint in time (PiT) copies of datafiles and control files copied periodically
(e.g. nightly)
Data Center Interconnection OptionsInternet
C t t
StatefulFirewalls
Data Center Interconnection OptionsInternet
Content
StatefulFirewalls
IntrusionDetection
ServerLoad Balancing
ContentCaching
HighDensity
MultilayerLAN
SwitchIntrusionDetection
ServerLoad Balancing
Caching
HighDensity
MultilayerLAN
Switch
SONET/SDH
Front-End Application Servers
Front-End Application Servers
DWDM/
Back-End Application Servers
High
Back-End Application Servers
High
DWDM/CWDM
gDensity
MultilayerSAN
Director
Enterprise-Class Storage Arrays
HighDensity
MultilayerSAN
Director
Enterprise-Class storage ArraysIP/Metro E
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 75
Data Center Transport OptionsData Center Transport Options
Increasing DistanceData
Center Campus Metro Regional National
Increasing Distance
Limited by Optics (Power Budget)Dark Fiber
CWDM
Sync
Sync (2Gbps) Limited by Optics (Power Budget)
cal
DWDM
SONET/SDH
Sync (2Gbps lambda)
Sync (1Gbps+ subrate) Async
Limited by BB_CreditsOpt
ic
Sync (Metro Eth) Async (1Gbps+)MDS9000 FCIP IP
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 76
Data Center Replication with SAN ExtensionExtension
Extend the normal reach ofSh d D Extend the normal reach of a Fibre Channel fabric
ReplicationRemote host to target array
Shared Data Cluster or
Remote Host Access to Storage
Remote host to target arrayShared data clusters
SAN Extension Network
FC FCReplication
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 77
SAN Design for Data ReplicationSAN Design for Data Replication
Servers with two fibreSite A Server
Access
Replication Fabrics
Servers with two fibre channel connections to storage arrays for high availability
FC
availabilityUse of multipath software is required in dual fabric host design
DC Interconnect
Network
design
SAN extension fabrics typically separate from
FC
typically separate from host access fabrics
Replication fabric requirements generally
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 78
Site B
FCReplication
fabrics
requirements generally specified by array vendor
Data Center Disaster RecoveryDisaster Recoverysample design
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 79
Disaster Impact RadiusDisaster Impact RadiusGlobal
Regional< 400km
PrimaryD t C t
SecondaryData CenterDR Site
Metro< 50km
Data CenterData CenterDR Site
Disasters are characterized by their impact
Local metro regional global
Local1–2 km
Local, metro, regional, globalFire, flood, earthquake, attack
Is the backup site within the threat radius?
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 80
radius?
Active/Standby Architecture - TodayActive/Standby Architecture TodayCA
High Availability Site 1CA
High Availability Site 2NC
Disaster Recovery Site
Hosts 1 Hosts 2 Hosts 3
HA Cluster(s) Electronic Journaling
Synch CWDMReplicationMDS 9509’s MDS 9509’s MDS 9509’s
Synch FCIPReplication
Asynchronous FCIP Replication
Dual OC12
MDS 9509Gateway
MDS 9509Gateway
MDS 9509Gateway
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 81
Storage 1 Storage 2 Storage 3Bunker
Frame Based ReplicationFrame Based Replication
ProductionCluster
Data Center 1D/R
Data Center 2
MDS DUAL OC12 MDS
SRDF
MDS DUAL OC12
R2 BCV/R1
PiTPiT
PiTPiT
Arch
Redo
PROD
Arch
Redo
D/R
BCVTimefinderTimefinder
SRDF/ASRDF/ASRDF/A
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 82
Arch
EMC/DMXEMC/DMX
Arch
EMC/DMXTriple Threat
Active/Active Architecture - Tomorrow
UserACE
decryptsrequest
ACEroutes
request
ACNScachespages
Service Locator Group Data Centers
Clustered Backend Y Active
DC2ActiveStandby
Requestsdirected to
b k
Content Engine
ACEprobes t k
GSS performs Site (DC) selection according to pre-configured condition, using
FQDN
Y ActiveX Standby Active
Data Y
ActiveData X
StandbyData X
backup application
track application
health
Presentation LayerMirror
Asynchronous Replication
Requestsdirected to
primary application
DC1Replication
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 83
Clustered Backend X Active
Y Standby
ActiveData X Active
Data YStandbyData Y
SANTap and Continuous Data ProtectionSANTap and Continuous Data Protection
Production Servers• SANTap• Appliance based storage replication• Reliable copy of WRITE operations• SCSI-FCIP communication
CDPAppliance
• Continuous Data Protection• Automatic and Continuous Backups• Time Addressable Storage (TAS) Appliance
MDS SAN
Time Addressable Storage (TAS)• Any Point-in-Time Recovery• Application based or Network based
SAN Tap
SecondaryPrimary
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 84
Fabric Based Replication with CDPFabric Based Replication with CDP
ProductionCluster
Data Center 1D/R
Data Center 2
DUAL OC12SANTap
Replication/CDPAppliance
Replication/CDPAppliance
MDSMDS
DUAL OC12
Arch
Redo
PROD
APiT
APiT
APiT
APiT
APiT
APiTArch
Redo
BCV
D/R
SRDF/ASRDF/ASRDF/A
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 85
Arch
EMC/DMX TAS/SATA TAS/SATA
Arch
EMC/DMX
End-End Data Center ResilienceEnd End Data Center Resilience
GSS-1 GSS-2
Corp. DNS
ACE-1 ACE-2 ACE-3
DC-3
Web/APP
Server
DC-2DC-1
IP/Optical Network
DB
CWDM/DWDM
Server Farm
FC
CWDM/DWDM
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 86
PrimaryLocation
FC SecondaryLocation
FC
Summary - Design DetailsSummary Design DetailsData centers 1 and 2 are in primary location with close enough distance that can provide DC HA for active/activeenough distance that can provide DC HA for active/active accessData Center 3 (DR) with > tolerable disaster radius, away for Primary DC 1 and 2for Primary DC 1 and 2Web/App server farms are load balanced geographicallyDB servers are within a geo HA cluster and running in aDB servers are within a geo-HA cluster and running in a L3 designSynchronize Data replication between data centers within y pthe primary locationAsynchronous Data replication is done between the primary and secondary storage systems
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 87
primary and secondary storage systems
© 2009 Cisco Systems, Inc. All rights reserved. Cisco ConfidentialPresentation_ID 88