Hyper v r2 deep dive

Hyper-V R2 High-AvailabilityHyper-V R2 High-AvailabilityDEEP DIVE!DEEP DIVE!

Greg Shields, MVP, vExpertGreg Shields, MVP, vExpertHead Geek, Concentrated Technologywww.ConcentratedTech.com

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC

http://www.ConcentratedTech.com/

AgendaAgenda

Part I– Understanding Live Migration’s Role in Hyper-V HA

Part II– The Fundamentals of Windows Failover Clustering

Part III– Building a Two-Node Hyper-V Cluster with iSCSI Storage

Part IV– Walking through the Management of a Hyper-V Cluster

Part V– Adding Disaster Recovery with Multi-Site Clustering

3

Part IPart IUnderstanding Live MigrationUnderstanding Live Migration’’s Role s Role in Hyper-V HAin Hyper-V HA

Do You Really Need HA?Do You Really Need HA?

High-availability adds dramatically greater uptime for virtual machines.– Protection against host failures

– Protection against resource overuse

– Protection against scheduled/unscheduled downtime

High-availability also adds much greater cost…– Shared storage between hosts

– Connectivity

– Higher (and more expensive) software editions

Not every environment needs HA!

5

What Really is Live Migration?What Really is Live Migration?

6

Part 1: Protection from Host Failures

What Really is Live Migration?What Really is Live Migration?

7

OverloadedVirtual Host

Shared Storage

UnderloadedVirtual Host

Network

Live Migration to New Host

Part 2: Load Balancing of VM/host Resources

Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration

Simply put: Migration speed is the difference.– In Hyper-V’s original release, a Hyper-V virtual machine could

be relocated with “a minimum” of downtime.

– This downtime was directly related to..…the amount of memory assigned to the virtual machine…the connection speed between virtual hosts and shared storage.

– Virtual machines with greater levels of assigned virtual memory and slow networks would take longer to complete a migration from one host to another.

– Those with less could complete the migration in a smaller amount of time.

With QM, a VM with 2G of vRAM could take 32 seconds or longer to migrate! Downtime ensues…

8


Down/dirty details…– During a Quick Migration, the virtual machine is immediately

put into a “Saved” state.

– This state is not a power down, nor is it the same as the Paused state.

– In the saved state – and unlike pausing – the virtual machine releases its memory reservation on the host machine and stores the contents of its memory pages to disk.

– Once this has completed, the target host can take over the ownership of the virtual machine and bring it back to operations.

9


Down/dirty details…– This saving of virtual machine state consumes most of the

time involved with a Quick Migration.

– Needed to reduce this time delay was a mechanism to pre-copy the virtual machine’s memory from source to target host.

– At the same moment the pre-copy would to log changes to memory pages that occur during the period of the copy.These changes tend to be relatively small in quantity, making the delta copy significantly smaller and faster than the original copy.

– Once the initial copy has completed, Live Migration then……pauses the virtual machine…copies the memory deltas…transfers ownership to the target host.

Much faster. Effectively “zero” downtime.

10

Part IIPart IIThe Fundamentals ofThe Fundamentals ofWindows Failover ClusteringWindows Failover Clustering

Why Why ClusteringClustering Fundamentals? Fundamentals?

Isn’t this, after all, a workshop on Hyper-V?

It is, but the only way to do highly-available Hyper-V is atop Windows Failover Clustering– Many people have given clustering a pass due to early

difficulties with its technologies.– Microsoft did us all a disservice by making every previous

version of Failover Clustering ridiculously painful to implement.– Most IT pros have no experience with clustering.– …but clustering doesn’t have to be hard. It just feels like it

does!

Doing clustering badly means doing HA Hyper-V badly!

ClusteringClustering’’s Sordid Historys Sordid History

Windows NT 4.0– Microsoft Cluster Service “wolfpack”– High-availability service that reduced availability– “As the corporate expert in Windows clustering, I recommend you

don’t use Windows clustering.” Windows 2000

– Greater availability, scalability. Still painful Windows 2003

– Added iSCSI storage to traditional Fibre Channel– SCSI Resets still used as method of last resort (painful)

Windows 2008– Eliminated use of SCSI Resets– Eliminated full-solution HCL requirement– Added Cluster Validation Wizard and pre-cluster tests– First version truly usable by IT generalists

WhatWhat’’s New & Changed in 2008s New & Changed in 2008

x64 EE gets up to 16 nodes. Backups get VSS support. Disks can be brought on-line without taking dependencies

offline. This allows disk extension without downtime. GPT disks are supported. Cluster self-healing. No longer reliant on disk signatures.

Multiple paths for identifying “lost” or failed disks. IPv6 & DHCP support. Network Name resource now uses DNS instead of WINS. Network Name resource more resilient. Loss of an IP address

need not bring Network Name resource offline. Geo-clustering…! a.k.a. cross-subnet clustering. Cluster

communications use TCP unicast and can span subnets.

So, What IS a Cluster?So, What IS a Cluster?

So, What IS a Cluster?So, What IS a Cluster?

Quorum Drive & Storage for Hyper-V

VMs

Cluster Quorum ModelsCluster Quorum Models Ever been to a Kiwanis meeting…? A cluster “exists” because it has quorum between its

members. That quorum is achieved through a voting process.– Different Kiwanis clubs have different rules for quorum.

– Different clusters have different rules for quorum.

If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained.– This is much different than a resource failover, which is the

reason why clusters are implemented.

Multiple quorum models exist, for different reasons.

Node & Disk MajorityNode & Disk Majority Node majority eliminates Win2003’s Quorum disk as a

point of failure. Works on a “voting system”.– A two-node cluster gets three votes.

– One for each node and one for the quorum.

– Two votes are needed for quorum.

Because of this model, the loss of the quorum disk only results in the loss of one vote.

Used when an evennumber of nodes arein the cluster.

Most-deployedmodel in production.

Node MajorityNode Majority Only shared storage devices get votes, replicated

storage does not. Requires 3+ votes, so need a minimum of three

members. Used when the number of

cluster nodes is odd. Can use replicated storage

instead of shared storage. Handy for stretch clusters.

File Share Witness ModelFile Share Witness Model Clustering without the nasty (expensive) shared storage!

– (Sort of…OK…not really…)

One file server can serve as witness for multiple clusters. Can be used for non-

production Hyper-Vclusters.(eval/demo only)

Most flexible model forstretch clusters.Eliminates issues ofcomplete site outage.

Witness Disk ModelWitness Disk Model Nodes get no votes. Only the quorum. Cluster remains up as long as one node can talk

to the witness disk. Effectively the same as legacy model. Bad. SPOF. Don’t use.

4 Steps to Cluster!4 Steps to Cluster! Step 1: Configure shared storage.

– Hardware SAN

– Software SAN a la StarWind iSCSI Target Software

Step 2: Attach Hyper-V Hosts to the iSCSI Target Step 3: Configure Windows Failover Clustering Step 4: Configure Hyper-V

Part IIIPart III-VIDEO--VIDEO-Building a Two-Node Hyper-V Cluster Building a Two-Node Hyper-V Cluster with iSCSI Storagewith iSCSI Storage

Part IVPart IVWalking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster

Cluster Shared VolumesCluster Shared Volumes Hyper-V v.1 required a single VM/LUN. v.1’s clustering underpinnings weren’t aware of the

files on a LUN. The “disk” was the cluster resource to failover.– Remember that only one node at a time can own a

resource.

v.2 adds cluster-awareness to individual volumes. This means that individual files on a LUN can be

owned by different hosts. Hosts respect the ownership of each other.

Cluster Shared VolumesCluster Shared Volumes Because NTFS is still the file system, this means

creating a meta-system of ownership information.

Each cluster node checks for ownership, respects the ownership of others, and updates info when they take over ownership.

Designed for use only by Hyper-V’s tiny number of files.

Going Beyond Two NodesGoing Beyond Two Nodes Windows Failover Clustering gets non-linearly more

complex as you add more hosts.– Complexity arrives in failover options.

Some critical best practices:– Manage Preferred Owners & Persistent Mode options correctly.

– Consider carefully the effects of Failback.

– Resist creating hybrid clusters that support other services.

– Integrate SCVMM for dramatically improved management

– Use disk “dependencies”as Affinity/Anti-Affinityrules.

– Add servers in pairs.

– Segregate traffic!!!

Best Practices in Network SegregationBest Practices in Network Segregation

Best Practices in Network SegregationBest Practices in Network Segregation

-DEMO--DEMO-Walking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster

30

Part VPart VAdding Disaster Recovery with Multi-Adding Disaster Recovery with Multi-Site ClusteringSite Clustering

What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?

– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.

– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.

– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.

– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.

– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.

What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?

– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.

– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.

– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.

– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.

– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.

DIS

AS

TER

!JU

ST A

BA

D

DA

Y!

What Makes a Disaster?What Makes a Disaster? Your business’ decision to “declare a disaster”

and move to “disaster operations” is a major one.

The technologies that are used for disaster protection are different than those used for HA.– More complex. More expensive.

Failover and failback processes involve more thought.

What Makes a Disaster?What Makes a Disaster? At a very high level, disaster recovery for virtual

environments is three things:

–A storage mechanism

–A replication mechanism

–A set of target servers to receive virtual machines and their data

What Makes a Disaster?What Makes a Disaster?

PrimaryHyper-V Server

PrimaryHyper-V Server

iSCSI Storage Device iSCSI Storage Device

BackupHyper-V Server

BackupHyper-V Server

Backup Site

Storage Device(s)

Replication Mechanism

Target Servers

Storage DeviceStorage Device Typically, two SANs in two different locations

– Fibre Channel or iSCSI– Usually similar model or manufacturer. This is often

necessary for replication mechanism to function property.

Backup SAN doesn’t necessarily need to be of the same size as the primary SAN– Replicated data isn’t always full set of data.

Replication MechanismReplication Mechanism Replication between SANs can occur…

Synchronously– Changes are made on one node at a time. Subsequent

changes on primary SAN must wait for ACK from backup SAN.

Asynchronously– Changes on backup SAN will eventually be written. Are

queued at primary SAN to be transferred at intervals.

Replication MechanismReplication Mechanism Synchronously

– Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.

iSCSI Storage DevicePrimary Site

iSCSI Storage DeviceBackup Site

Change Committed at Primary Site

Change Replicated to Secondary Site

Change Committed at Secondary Site

Acknowledge of Change Returned to

Primary Site

Change Complete

Replication MechanismReplication Mechanism Asynchronously

– Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.

iSCSI Storage DevicePrimary Site

iSCSI Storage DeviceBackup Site

Change 1 Committed at Primary Site



Changes Replicated to Secondary Site


Replication MechanismReplication Mechanism Which Should You Choose…?

Synchronous– Assures no loss of data.

– Requires a high-bandwidth and low-latency connection.

– Write and acknowledgement latencies impact performance.

– Requires shorter distances between storage devices.

Asynchronous– Potential for loss of data during a failure.

– Leverages smaller-bandwidth connections, more tolerant of latency.

– No performance impact.

– Potential to stretch across longer distances.

Your Recovery Point Objective makes this decision…

Replication MechanismReplication Mechanism Replication processing can occur…

Storage Layer– Replication processing is handled by the SAN itself. Often agents

are installed to virtual hosts or machines to ensure crash consistency.

– Easier to set up, fewer moving parts. More scalable. Concerns about crash consistency.

OS / Application Layer– Replication processing is handled by software in the VM OS. This

software also operates as the agent.– More challenging to set up, more moving parts. More installations

to manage/monitor. Scalability and cost are linear. Fewer concerns about crash consistency.

The Problem with Transactional The Problem with Transactional DatabasesDatabases

O/S Crash Consistency is easy to obtain.– Just quiesce the file system before beginning the

replication.

Application Crash Consistency much harder.– Transactional databases like AD, Exchange, SQL

don’t quiesce when the file system does.

– Need to stop these databases before quiescence.

– Or, need an agent in the VM that handles DB quiescence.

Replication without crash consistency will lose data.

DB comes back in “inconsistent” state.

Four-Step Process for VSSFour-Step Process for VSS

Step 1: A requestor, such as replication software, requests the server to invoke a shadow copy.

Step 2: A provider accepts the request and calls an application-specific provider (SQL, Exchange, etc.) if necessary.

Step 3: Application-specific provider coordinates system shadow copy with app quiescence to ensure application consistency.

Step 4: Shadow copy is created.

…then the replication can start…

44

Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup

site. With Hyper-V these servers are part of a Multi-

Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site

cluster, except that it expands over multiple sites.

– Some changes to management and configuration tactics required.

Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup

site. With Hyper-V these servers are part of a Multi-

Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site

cluster, except that it expands over multiple sites.

– Some changes to management and configuration tactics required.Hyper-V Server

Hyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Install servers to sites so that your primary site

always contains more servers than backup sites.– Eliminates some problems with quorum during site

outage.

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Leverage Node and File Share Quorum when

possible.– Prevents entire-site outage from impacting quorum.

– Enables creation of multiple clusters if necessary.

Hyper-V ServerHyper-V Server

iSCSIStorage

iSCSIStorage

Backup Site

NetworkSwitch

NetworkSwitch

NetworkSwitch

NetworkSwitch

Witness Server

Witness Site

Third Site for Witness Server

Multi-Site Cluster TacticsMulti-Site Cluster Tactics Ensure that networking remains available when

VMs migrate from primary to backup site.– R2 clustering can now span subnets.

This seems like a good thing, but only if you plan correctly for it.– Remember that crossing subnets also means changing IP

address, subnet mask, gateway, etc, at new site.– This can be automatically done by using DHCP and dynamic

DNS, or must be manually updated.– DNS replication is also a problem. Clients will require time

to update their local cache.– Consider reducing DNS TTL or clearing client cache.

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

This work is copyright ©Concentrated Technology, LLC

http://www.ConcentratedTech.com/