42
Joint Business Launch DISASTER RECOVERY AND MULTI-SITE CLUSTERING WITH WINDOWS SERVER 2008 R2 VIJAY TEWARI, PRINCIPAL PROGRAM MANAGER, WINDOWS SERVER NOV 17, 2009

Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Embed Size (px)

Citation preview

Page 1: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Joint Business LaunchDISASTER RECOVERY AND MULTI-SITE CLUSTERING WITH WINDOWS SERVER 2008 R2

VIJAY TEWARI, PRINCIPAL PROGRAM MANAGER, WINDOWS SERVER NOV 17, 2009

Page 2: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Session Objectives And Takeaways

Session Objective(s): Understanding the need and benefit of multi-site clustersWhat to consider as you plan, design, and deploy your first multi-site cluster

Windows Server Failover Clustering is a great solution for not only high availability, but also disaster recovery

Page 3: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Multi-Site Clustering

Introduction Networking Storage Quorum Workloads

Page 4: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Site A But what if there is a catastrophic

event?

Fire, flood, earthquake …

Same Physical Location

SAN

Is my Cluster Resilient to Site Failures?

Page 5: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Site BSite AApplications are failed over to a

separate physical location

Node is moved to a physically

separate site

Multi-Site Clusters for DR

Extends a cluster from being a High Availability solution, to also being a Disaster Recovery solution

SANSAN

Page 6: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Benefits of a Multi-Site Cluster

Protects against loss of an entire datacenterAutomates failover

Reduced downtimeLower complexity disaster recovery plan

Reduces administrative overheadAutomatically synchronize application and cluster changesEasier to keep consistent than standalone servers

The primary reason DR solutions fail isdependence on people

Page 7: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Multi-Site Clustering

Introduction Networking Storage Quorum Workloads

Page 8: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Network ConsiderationsNetwork Options:

1. Stretch VLAN’s across sites2. Cluster nodes can reside in different subnets

Site A

Public Network

Site B10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Separate

Network

Page 9: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Stretching the NetworkLonger distance traditionally means greater network latencyToo many missed health checks can cause false failoverHeartbeating is fully configurable

SameSubnetDelay (default = 1 second)Frequency heartbeats are sent

SameSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down

CrossSubnetDelay (default = 1 second)Frequency heartbeats are sent to nodes on dissimilar subnets

CrossSubnetThreshold (default = 5 heartbeats)Missed heartbeats before an interface is considered down to nodes on dissimilar subnets

Command Line: Cluster.exe /propPowerShell (R2): Get-Cluster | fl *

Page 10: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Security over the WAN

Encrypt intra-node traffic0 = clear text1 = signed (default)2 = encrypted

Site A Site B10.10.10.1 20.20.20.1

30.30.30.1 40.40.40.1

Page 11: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Enhanced Dependencies – ORNetwork Name resource stays up if either IP Address Resource A OR IP Address Resource B is up

OR

Network Name resource

IP Address Resource A

IP Address Resource B

Page 12: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Client Reconnect Considerations

Nodes in dissimilar subnetsFailover changes resource’s IP AddressClients need that new IP Address from DNS to reconnect

10.10.10.111 20.20.20.222

DNS Server 1DNS Server 2DNS Replication

Record Updated

Record Created

Record Obtained

FS = 10.10.10.111

Record Updated

FS = 20.20.20.222Site A Site B

Page 13: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Solution #1: Configure NN SettingRegisterAllProvidersIP (default = 0 for FALSE)

Determines if all IP Addresses for a Network Name will be registered by DNS

TRUE (1): IP Addresses can be online or offline and will still be registered

Ensure application is set to try all IP Addresses, so clients can connect quicker

HostRecordTTL (default = 1200 seconds)Controls time the DNS record lives on client for a cluster network name

Shorter TTL: DNS records for clients updated sooner

Page 14: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Solution #2: Prefer Local Failover

Local failover for higher availabilityNo change in IP Address

Cross-site failover for disaster recovery

10.10.10.111

DNS Server 1 DNS Server 2

FS = 10.10.10.111Site A Site B

20.20.20.222

Page 15: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Solution #3: Stretch VLAN’s

Deploying a VLAN minimizes client reconnection times

DNS Server 1 DNS Server 2

FS = 10.10.10.111

Site A Site B

10.10.10.11110.10.10.111

VLAN

Page 16: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Solution #4: Abstraction in Device

Network device uses 3rd IP3rd IP is the one registered in DNS & used by clientExample:http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/App_Networking/extmsftw2k8vistacisco.pdf

10.10.10.111 20.20.20.222

DNS Server 1

DNS Server 2

FS = 30.30.30.30Site A Site B

30.30.30.30

Page 17: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

This is generic guidance…

If you have other creative ideas, that’s ok!

Page 18: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Multi-Site Clustering

Introduction Networking Storage Quorum Workloads

Page 19: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Storage in Multi-Site Clusters

Different than local clusters:Multiple storage arrays – independent per siteNodes commonly access own site storageNo “true” shared disk visible to all nodes

Site A Site B

Page 20: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Site A

Changes are made on Site A and replicated to

Site B

Site B

Replica

Storage Considerations

Need a data replication mechanism between sites

Page 21: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Replication Alternatives

Replication levels:Hardware storage-based replication. Eg.

Software host-based replication. Eg.

Application-based replication

Page 22: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Synchronous vs. Asynchronous

Synchronous AsynchronousNo data loss Potential data loss on

hard failuresRequires high bandwidth/low

latency connection

Enough bandwidth to keep up with data

replicationStretches over shorter

distancesStretches over longer

distancesWrite latencies impact

application performance

No significant impact on application performance

Page 23: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Cluster Validation and Replication

Multi-Site clusters are not required to pass the Storage tests to be supported

Validation Guide and Policy

http://go.microsoft.com/fwlink/?LinkID=119949

Page 24: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Multi-Site Clustering

Introduction Networking Storage Quorum Workloads

Page 25: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Quorum Overview

Disk only (not recommended)Node and Disk majority

Node majorityNode and File Share majority

VoteVote Vote Vote Vote

Majority is greater than 50%Possible Voters:

Nodes (1 each) + 1 Witness (Disk or File Share)4 Quorum Types

Page 26: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Replicated Disk Witness

A witness is a decision maker when nodes lose network connectivity

When a witness is not a single decision maker, problems occur

Do not use in multi-site clusters unless directed by vendor

Replicated Storage from

vendor

?

Vote Vote Vote

Page 27: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Site BSite A

Cross site network connectivity

broken!

Can I communicate with

majority of the nodes in the

cluster?Yes, then Stay Up

Can I communicate with

majority of the nodes in the

cluster?No, drop out of

Cluster Membership

5 Node Cluster: Majority = 3

Majority in Primary

Site

SANSAN

Node Majority

Page 28: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Node Majority

Site BSite A

Disaster at Site 1

We are down! Can I communicate with

majority of the nodes in the

cluster?No, drop out of

Cluster Membership

Majority in Primary

Site

5 Node Cluster: Majority = 3

SANSAN

Need to force quorum manually

Page 29: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Forcing Quorum

Always understand why quorum was lostUsed to bring cluster online without quorumCluster starts in a special “forced” stateOnce majority achieved, no more “forced” state

Command Line:net start clussvc /fixquorum (or /fq)

PowerShell (R2):Start-ClusterNode –FixQuorum (or –fq)

Page 30: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Site A Site B

Site C

Complete resiliency and automatic recovery from the loss of any 1 site

Replicated Storage

\\Foo\Cluster1

SAN SAN

WAN

Multi-Site With File Share WitnessFile Share Witness

Page 31: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

WANSite A Site B

Site C

Complete resiliency and automatic recovery from the loss of connection between sites

Replicated Storage

SAN SAN

Multi-Site With File Share Witness

Can I communicate with majority of the nodes (+FSW) in the

cluster?Yes, then Stay Up

File Share Witness

Can I communicate with majority of the nodes in the cluster?No (lock failed), drop

out of Cluster Membership

\\Foo\Cluster1

Page 32: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Quorum Model Summary

No Majority: Disk OnlyNot RecommendedUse as directed by vendor

Node and Disk MajorityUse as directed by vendor

Node MajorityOdd number of nodesMore nodes in primary site

Node and File Share MajorityEven number of nodesBest availability solution – FSW in 3rd site

Page 33: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Multi-Site Clustering

Introduction Networking Storage Quorum Workloads

Page 34: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Hyper-V in a Multi-Site Cluster

Area ConsiderationsNetwork -On cross-subnet failover, if guest

is …- DHCP, then IP updated automatically- Statically configured IP, then admin

needs to configure new IP-Use VLAN preferred with live migration between sites

Storage -3rd party replication solution required-Configuration with CSV (explained next)

Quorum -No special considerationsLinks: http://technet.microsoft.com/en-us/library/dd197488.aspx

Page 35: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

CSV in a Multi-Site Cluster

Architectural assumptions collide…Replication solutions assume only 1 array accessed at a timeCSV assumes all nodes can concurrently access the LUN

CSV is not required for Live MigrationTalk to your storage vendor for their support story

VHD

Nodes in Primary Site Nodes in Disaster Recovery Site

Read/OnlyRead/WriteReplication

VM attempts to access

replica

Page 36: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

SQL in a Multi-Site Cluster

Area ConsiderationsNetwork -SQL does not support OR

dependency-Need to stretch VLAN between sites

Storage -No special considerations-3rd party replication solution required

Quorum -No special considerationsLinks:http://technet.microsoft.com/en-us/library/ms189134.aspx http://technet.microsoft.com/en-us/library/ms178128.aspx

Page 37: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Exchange in a Multi-Site Cluster

Area ConsiderationsNetwork -No VLAN needed

-Change HostRecordTTL from 20 minutes to 5 minutes-CCR supports 2 nodes, one per site

Storage -Exchange CCR provides application-based replication

Quorum -File share witness on the Hub Transport server on primary site

Links:http://technet.microsoft.com/en-us/library/bb124721.aspx http://technet.microsoft.com/en-us/library/aa998848.aspx

Page 38: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

demo

Setting up a cluster and Live Migration

Page 39: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Demo Environment Overview

HVNODE1(Microsoft Hyper-V Server 2008 R2)

HVNODE2(Windows Server 2008 R2 deployed as Server core)

Gigabit Switch

CONTOSO:Domain Controller and iSCSI storage

Page 40: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

Session Summary

Multi-Site Failover Clustering has many benefitsRedundancy is needed everywhereUnderstand your replication needsCompare VLANs with multiple subnetsPlan quorum model & nodes before deploymentFollow the checklist and best practices

Page 41: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

ResourcesCluster Team Blog: http://blogs.msdn.com/clustering/ Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/Clustering Forum (2008 R2):

http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/

Clustering Newsgroup: http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?dg=microsoft.public.windows.server.clustering

Failover Clustering Deployment Guide: http://technet.microsoft.com/en-us/library/dd197477.aspx TechNet: Configure a Service or Application for High Availability: http://technet.microsoft.com/en-us/library/cc732478.aspx TechNet: Installing a Failover Cluster: http://technet.microsoft.com/en-us/library/cc772178.aspx TechNet: Creating a Failover Cluster: http://technet.microsoft.com/en-us/library/cc755009.aspxWebcast (2008 R2): Introduction to Failover Clustering: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407190&Culture=en-USWebcast (2008 R2): HA Basics with Hyper-V: http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407222&Culture=en-US Webcast (2008 R2): Cluster Shared Volumes (CSV):http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407238&Culture=en-US

Page 42: Site A But what if there is a catastrophic event? Fire, flood, earthquake … Same Physical Location

© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the

date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.