51
Hyper-V R2 High-Availability Hyper-V R2 High-Availability DEEP DIVE! DEEP DIVE! Greg Shields, MVP, Greg Shields, MVP, vExpert vExpert Head Geek, Concentrated Technology www.ConcentratedTech.com

Hyper v r2 deep dive

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Hyper v r2 deep dive

Hyper-V R2 High-AvailabilityHyper-V R2 High-AvailabilityDEEP DIVE!DEEP DIVE!

Greg Shields, MVP, vExpertGreg Shields, MVP, vExpertHead Geek, Concentrated Technologywww.ConcentratedTech.com

Page 2: Hyper v r2 deep dive

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC

Page 3: Hyper v r2 deep dive

AgendaAgenda

Part I– Understanding Live Migration’s Role in Hyper-V HA

Part II– The Fundamentals of Windows Failover Clustering

Part III– Building a Two-Node Hyper-V Cluster with iSCSI Storage

Part IV– Walking through the Management of a Hyper-V Cluster

Part V– Adding Disaster Recovery with Multi-Site Clustering

3

Page 4: Hyper v r2 deep dive

Part IPart IUnderstanding Live MigrationUnderstanding Live Migration’’s Role s Role in Hyper-V HAin Hyper-V HA

Page 5: Hyper v r2 deep dive

Do You Really Need HA?Do You Really Need HA?

High-availability adds dramatically greater uptime for virtual machines.– Protection against host failures

– Protection against resource overuse

– Protection against scheduled/unscheduled downtime

High-availability also adds much greater cost…– Shared storage between hosts

– Connectivity

– Higher (and more expensive) software editions

Not every environment needs HA!

5

Page 6: Hyper v r2 deep dive

What Really is Live Migration?What Really is Live Migration?

6

Part 1: Protection from Host Failures

Page 7: Hyper v r2 deep dive

What Really is Live Migration?What Really is Live Migration?

7

OverloadedVirtual Host

Shared Storage

UnderloadedVirtual Host

Network

Live Migration to New Host

Part 2: Load Balancing of VM/host Resources

Page 8: Hyper v r2 deep dive

Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration

Simply put: Migration speed is the difference.– In Hyper-V’s original release, a Hyper-V virtual machine could

be relocated with “a minimum” of downtime.

– This downtime was directly related to..…the amount of memory assigned to the virtual machine…the connection speed between virtual hosts and shared storage.

– Virtual machines with greater levels of assigned virtual memory and slow networks would take longer to complete a migration from one host to another.

– Those with less could complete the migration in a smaller amount of time.

With QM, a VM with 2G of vRAM could take 32 seconds or longer to migrate! Downtime ensues…

8

Page 9: Hyper v r2 deep dive

Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration

Down/dirty details…– During a Quick Migration, the virtual machine is immediately

put into a “Saved” state.

– This state is not a power down, nor is it the same as the Paused state.

– In the saved state – and unlike pausing – the virtual machine releases its memory reservation on the host machine and stores the contents of its memory pages to disk.

– Once this has completed, the target host can take over the ownership of the virtual machine and bring it back to operations.

9

Page 10: Hyper v r2 deep dive

Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration

Down/dirty details…– This saving of virtual machine state consumes most of the

time involved with a Quick Migration.

– Needed to reduce this time delay was a mechanism to pre-copy the virtual machine’s memory from source to target host.

– At the same moment the pre-copy would to log changes to memory pages that occur during the period of the copy.These changes tend to be relatively small in quantity, making the delta copy significantly smaller and faster than the original copy.

– Once the initial copy has completed, Live Migration then……pauses the virtual machine…copies the memory deltas…transfers ownership to the target host.

Much faster. Effectively “zero” downtime.

10

Page 11: Hyper v r2 deep dive

Part IIPart IIThe Fundamentals ofThe Fundamentals ofWindows Failover ClusteringWindows Failover Clustering

Page 12: Hyper v r2 deep dive

Why Why ClusteringClustering Fundamentals? Fundamentals?

Isn’t this, after all, a workshop on Hyper-V?

It is, but the only way to do highly-available Hyper-V is atop Windows Failover Clustering– Many people have given clustering a pass due to early

difficulties with its technologies.– Microsoft did us all a disservice by making every previous

version of Failover Clustering ridiculously painful to implement.– Most IT pros have no experience with clustering.– …but clustering doesn’t have to be hard. It just feels like it

does!

Doing clustering badly means doing HA Hyper-V badly!

Page 13: Hyper v r2 deep dive

ClusteringClustering’’s Sordid Historys Sordid History

Windows NT 4.0– Microsoft Cluster Service “wolfpack”– High-availability service that reduced availability– “As the corporate expert in Windows clustering, I recommend you

don’t use Windows clustering.” Windows 2000

– Greater availability, scalability. Still painful Windows 2003

– Added iSCSI storage to traditional Fibre Channel– SCSI Resets still used as method of last resort (painful)

Windows 2008– Eliminated use of SCSI Resets– Eliminated full-solution HCL requirement– Added Cluster Validation Wizard and pre-cluster tests– First version truly usable by IT generalists

Page 14: Hyper v r2 deep dive

WhatWhat’’s New & Changed in 2008s New & Changed in 2008

x64 EE gets up to 16 nodes. Backups get VSS support. Disks can be brought on-line without taking dependencies

offline. This allows disk extension without downtime. GPT disks are supported. Cluster self-healing. No longer reliant on disk signatures.

Multiple paths for identifying “lost” or failed disks. IPv6 & DHCP support. Network Name resource now uses DNS instead of WINS. Network Name resource more resilient. Loss of an IP address

need not bring Network Name resource offline. Geo-clustering…! a.k.a. cross-subnet clustering. Cluster

communications use TCP unicast and can span subnets.

Page 15: Hyper v r2 deep dive

So, What IS a Cluster?So, What IS a Cluster?

Page 16: Hyper v r2 deep dive

So, What IS a Cluster?So, What IS a Cluster?

Quorum Drive & Storage for Hyper-V

VMs

Page 17: Hyper v r2 deep dive

Cluster Quorum ModelsCluster Quorum Models Ever been to a Kiwanis meeting…? A cluster “exists” because it has quorum between its

members. That quorum is achieved through a voting process.– Different Kiwanis clubs have different rules for quorum.

– Different clusters have different rules for quorum.

If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained.– This is much different than a resource failover, which is the

reason why clusters are implemented.

Multiple quorum models exist, for different reasons.

Page 18: Hyper v r2 deep dive

Node & Disk MajorityNode & Disk Majority Node majority eliminates Win2003’s Quorum disk as a

point of failure. Works on a “voting system”.– A two-node cluster gets three votes.

– One for each node and one for the quorum.

– Two votes are needed for quorum.

Because of this model, the loss of the quorum disk only results in the loss of one vote.

Used when an evennumber of nodes arein the cluster.

Most-deployedmodel in production.

Page 19: Hyper v r2 deep dive

Node MajorityNode Majority Only shared storage devices get votes, replicated

storage does not. Requires 3+ votes, so need a minimum of three

members. Used when the number of

cluster nodes is odd. Can use replicated storage

instead of shared storage. Handy for stretch clusters.

Page 20: Hyper v r2 deep dive

File Share Witness ModelFile Share Witness Model Clustering without the nasty (expensive) shared storage!

– (Sort of…OK…not really…)

One file server can serve as witness for multiple clusters. Can be used for non-

production Hyper-Vclusters.(eval/demo only)

Most flexible model forstretch clusters.Eliminates issues ofcomplete site outage.

Page 21: Hyper v r2 deep dive

Witness Disk ModelWitness Disk Model Nodes get no votes. Only the quorum. Cluster remains up as long as one node can talk

to the witness disk. Effectively the same as legacy model. Bad. SPOF. Don’t use.

Page 22: Hyper v r2 deep dive

4 Steps to Cluster!4 Steps to Cluster! Step 1: Configure shared storage.

– Hardware SAN

– Software SAN a la StarWind iSCSI Target Software

Step 2: Attach Hyper-V Hosts to the iSCSI Target Step 3: Configure Windows Failover Clustering Step 4: Configure Hyper-V

Page 23: Hyper v r2 deep dive

Part IIIPart III-VIDEO--VIDEO-Building a Two-Node Hyper-V Cluster Building a Two-Node Hyper-V Cluster with iSCSI Storagewith iSCSI Storage

Page 24: Hyper v r2 deep dive

Part IVPart IVWalking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster

Page 25: Hyper v r2 deep dive

Cluster Shared VolumesCluster Shared Volumes Hyper-V v.1 required a single VM/LUN. v.1’s clustering underpinnings weren’t aware of the

files on a LUN. The “disk” was the cluster resource to failover.– Remember that only one node at a time can own a

resource.

v.2 adds cluster-awareness to individual volumes. This means that individual files on a LUN can be

owned by different hosts. Hosts respect the ownership of each other.

Page 26: Hyper v r2 deep dive

Cluster Shared VolumesCluster Shared Volumes Because NTFS is still the file system, this means

creating a meta-system of ownership information.

Each cluster node checks for ownership, respects the ownership of others, and updates info when they take over ownership.

Designed for use only by Hyper-V’s tiny number of files.

Page 27: Hyper v r2 deep dive

Going Beyond Two NodesGoing Beyond Two Nodes Windows Failover Clustering gets non-linearly more

complex as you add more hosts.– Complexity arrives in failover options.

Some critical best practices:– Manage Preferred Owners & Persistent Mode options correctly.

– Consider carefully the effects of Failback.

– Resist creating hybrid clusters that support other services.

– Integrate SCVMM for dramatically improved management

– Use disk “dependencies”as Affinity/Anti-Affinityrules.

– Add servers in pairs.

– Segregate traffic!!!

Page 28: Hyper v r2 deep dive

Best Practices in Network SegregationBest Practices in Network Segregation

Page 29: Hyper v r2 deep dive

Best Practices in Network SegregationBest Practices in Network Segregation

Page 30: Hyper v r2 deep dive

-DEMO--DEMO-Walking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster

30

Page 31: Hyper v r2 deep dive

Part VPart VAdding Disaster Recovery with Multi-Adding Disaster Recovery with Multi-Site ClusteringSite Clustering

Page 32: Hyper v r2 deep dive

What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?

– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.

– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.

– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.

– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.

– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.

Page 33: Hyper v r2 deep dive

What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?

– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.

– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.

– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.

– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.

– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.

DIS

AS

TER

!JU

ST A

BA

D

DA

Y!

Page 34: Hyper v r2 deep dive

What Makes a Disaster?What Makes a Disaster? Your business’ decision to “declare a disaster”

and move to “disaster operations” is a major one.

The technologies that are used for disaster protection are different than those used for HA.– More complex. More expensive.

Failover and failback processes involve more thought.

Page 35: Hyper v r2 deep dive

What Makes a Disaster?What Makes a Disaster? At a very high level, disaster recovery for virtual

environments is three things:

–A storage mechanism

–A replication mechanism

–A set of target servers to receive virtual machines and their data

Page 36: Hyper v r2 deep dive

What Makes a Disaster?What Makes a Disaster?

PrimaryHyper-V Server

PrimaryHyper-V Server

iSCSI Storage Device iSCSI Storage Device

BackupHyper-V Server

BackupHyper-V Server

Backup Site

Storage Device(s)

Replication Mechanism

Target Servers

Page 37: Hyper v r2 deep dive

Storage DeviceStorage Device Typically, two SANs in two different locations

– Fibre Channel or iSCSI– Usually similar model or manufacturer. This is often

necessary for replication mechanism to function property.

Backup SAN doesn’t necessarily need to be of the same size as the primary SAN– Replicated data isn’t always full set of data.

Page 38: Hyper v r2 deep dive

Replication MechanismReplication Mechanism Replication between SANs can occur…

Synchronously– Changes are made on one node at a time. Subsequent

changes on primary SAN must wait for ACK from backup SAN.

Asynchronously– Changes on backup SAN will eventually be written. Are

queued at primary SAN to be transferred at intervals.

Page 39: Hyper v r2 deep dive

Replication MechanismReplication Mechanism Synchronously

– Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.

iSCSI Storage DevicePrimary Site

iSCSI Storage DeviceBackup Site

Change Committed at Primary Site

Change Replicated to Secondary Site

Change Committed at Secondary Site

Acknowledge of Change Returned to

Primary Site

Change Complete

Page 40: Hyper v r2 deep dive

Replication MechanismReplication Mechanism Asynchronously

– Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.

iSCSI Storage DevicePrimary Site

iSCSI Storage DeviceBackup Site

Change 1 Committed at Primary Site

Change 2 Committed at Primary Site

Change 3 Committed at Primary Site

Changes Replicated to Secondary Site

Change 4 Committed at Primary Site

Page 41: Hyper v r2 deep dive

Replication MechanismReplication Mechanism Which Should You Choose…?

Synchronous– Assures no loss of data.

– Requires a high-bandwidth and low-latency connection.

– Write and acknowledgement latencies impact performance.

– Requires shorter distances between storage devices.

Asynchronous– Potential for loss of data during a failure.

– Leverages smaller-bandwidth connections, more tolerant of latency.

– No performance impact.

– Potential to stretch across longer distances.

Your Recovery Point Objective makes this decision…

Page 42: Hyper v r2 deep dive

Replication MechanismReplication Mechanism Replication processing can occur…

Storage Layer– Replication processing is handled by the SAN itself. Often agents

are installed to virtual hosts or machines to ensure crash consistency.

– Easier to set up, fewer moving parts. More scalable. Concerns about crash consistency.

OS / Application Layer– Replication processing is handled by software in the VM OS. This

software also operates as the agent.– More challenging to set up, more moving parts. More installations

to manage/monitor. Scalability and cost are linear. Fewer concerns about crash consistency.

Page 43: Hyper v r2 deep dive

The Problem with Transactional The Problem with Transactional DatabasesDatabases

O/S Crash Consistency is easy to obtain.– Just quiesce the file system before beginning the

replication.

Application Crash Consistency much harder.– Transactional databases like AD, Exchange, SQL

don’t quiesce when the file system does.

– Need to stop these databases before quiescence.

– Or, need an agent in the VM that handles DB quiescence.

Replication without crash consistency will lose data.

DB comes back in “inconsistent” state.

Page 44: Hyper v r2 deep dive

Four-Step Process for VSSFour-Step Process for VSS

Step 1: A requestor, such as replication software, requests the server to invoke a shadow copy.

Step 2: A provider accepts the request and calls an application-specific provider (SQL, Exchange, etc.) if necessary.

Step 3: Application-specific provider coordinates system shadow copy with app quiescence to ensure application consistency.

Step 4: Shadow copy is created.

…then the replication can start…

44

Page 45: Hyper v r2 deep dive

Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup

site. With Hyper-V these servers are part of a Multi-

Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site

cluster, except that it expands over multiple sites.

– Some changes to management and configuration tactics required.

Page 46: Hyper v r2 deep dive

Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup

site. With Hyper-V these servers are part of a Multi-

Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site

cluster, except that it expands over multiple sites.

– Some changes to management and configuration tactics required.Hyper-V Server

Hyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Page 47: Hyper v r2 deep dive

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Install servers to sites so that your primary site

always contains more servers than backup sites.– Eliminates some problems with quorum during site

outage.

Page 48: Hyper v r2 deep dive

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Leverage Node and File Share Quorum when

possible.– Prevents entire-site outage from impacting quorum.

– Enables creation of multiple clusters if necessary.

Hyper-V ServerHyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Witness Server

Witness Site

Third Site for Witness Server

Page 49: Hyper v r2 deep dive

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Ensure that networking remains available when

VMs migrate from primary to backup site.– R2 clustering can now span subnets.

This seems like a good thing, but only if you plan correctly for it.– Remember that crossing subnets also means changing IP

address, subnet mask, gateway, etc, at new site.– This can be automatically done by using DHCP and dynamic

DNS, or must be manually updated.– DNS replication is also a problem. Clients will require time

to update their local cache.– Consider reducing DNS TTL or clearing client cache.

Page 50: Hyper v r2 deep dive
Page 51: Hyper v r2 deep dive

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC