Upload
concentrated-technology
View
1.426
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Hyper-V R2 High-AvailabilityHyper-V R2 High-AvailabilityDEEP DIVE!DEEP DIVE!
Greg Shields, MVP, vExpertGreg Shields, MVP, vExpertHead Geek, Concentrated Technologywww.ConcentratedTech.com
This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it
within your own organization however you like.
For more information on our company, including information on private classes and upcoming conference appearances, please
visit our Web site, www.ConcentratedTech.com.
For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg
This work is copyright ©Concentrated Technology, LLC
AgendaAgenda
Part I– Understanding Live Migration’s Role in Hyper-V HA
Part II– The Fundamentals of Windows Failover Clustering
Part III– Building a Two-Node Hyper-V Cluster with iSCSI Storage
Part IV– Walking through the Management of a Hyper-V Cluster
Part V– Adding Disaster Recovery with Multi-Site Clustering
3
Part IPart IUnderstanding Live MigrationUnderstanding Live Migration’’s Role s Role in Hyper-V HAin Hyper-V HA
Do You Really Need HA?Do You Really Need HA?
High-availability adds dramatically greater uptime for virtual machines.– Protection against host failures
– Protection against resource overuse
– Protection against scheduled/unscheduled downtime
High-availability also adds much greater cost…– Shared storage between hosts
– Connectivity
– Higher (and more expensive) software editions
Not every environment needs HA!
5
What Really is Live Migration?What Really is Live Migration?
6
Part 1: Protection from Host Failures
What Really is Live Migration?What Really is Live Migration?
7
OverloadedVirtual Host
Shared Storage
UnderloadedVirtual Host
Network
Live Migration to New Host
Part 2: Load Balancing of VM/host Resources
Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration
Simply put: Migration speed is the difference.– In Hyper-V’s original release, a Hyper-V virtual machine could
be relocated with “a minimum” of downtime.
– This downtime was directly related to..…the amount of memory assigned to the virtual machine…the connection speed between virtual hosts and shared storage.
– Virtual machines with greater levels of assigned virtual memory and slow networks would take longer to complete a migration from one host to another.
– Those with less could complete the migration in a smaller amount of time.
With QM, a VM with 2G of vRAM could take 32 seconds or longer to migrate! Downtime ensues…
8
Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration
Down/dirty details…– During a Quick Migration, the virtual machine is immediately
put into a “Saved” state.
– This state is not a power down, nor is it the same as the Paused state.
– In the saved state – and unlike pausing – the virtual machine releases its memory reservation on the host machine and stores the contents of its memory pages to disk.
– Once this has completed, the target host can take over the ownership of the virtual machine and bring it back to operations.
9
Comparing Quick w/ Live MigrationComparing Quick w/ Live Migration
Down/dirty details…– This saving of virtual machine state consumes most of the
time involved with a Quick Migration.
– Needed to reduce this time delay was a mechanism to pre-copy the virtual machine’s memory from source to target host.
– At the same moment the pre-copy would to log changes to memory pages that occur during the period of the copy.These changes tend to be relatively small in quantity, making the delta copy significantly smaller and faster than the original copy.
– Once the initial copy has completed, Live Migration then……pauses the virtual machine…copies the memory deltas…transfers ownership to the target host.
Much faster. Effectively “zero” downtime.
10
Part IIPart IIThe Fundamentals ofThe Fundamentals ofWindows Failover ClusteringWindows Failover Clustering
Why Why ClusteringClustering Fundamentals? Fundamentals?
Isn’t this, after all, a workshop on Hyper-V?
It is, but the only way to do highly-available Hyper-V is atop Windows Failover Clustering– Many people have given clustering a pass due to early
difficulties with its technologies.– Microsoft did us all a disservice by making every previous
version of Failover Clustering ridiculously painful to implement.– Most IT pros have no experience with clustering.– …but clustering doesn’t have to be hard. It just feels like it
does!
Doing clustering badly means doing HA Hyper-V badly!
ClusteringClustering’’s Sordid Historys Sordid History
Windows NT 4.0– Microsoft Cluster Service “wolfpack”– High-availability service that reduced availability– “As the corporate expert in Windows clustering, I recommend you
don’t use Windows clustering.” Windows 2000
– Greater availability, scalability. Still painful Windows 2003
– Added iSCSI storage to traditional Fibre Channel– SCSI Resets still used as method of last resort (painful)
Windows 2008– Eliminated use of SCSI Resets– Eliminated full-solution HCL requirement– Added Cluster Validation Wizard and pre-cluster tests– First version truly usable by IT generalists
WhatWhat’’s New & Changed in 2008s New & Changed in 2008
x64 EE gets up to 16 nodes. Backups get VSS support. Disks can be brought on-line without taking dependencies
offline. This allows disk extension without downtime. GPT disks are supported. Cluster self-healing. No longer reliant on disk signatures.
Multiple paths for identifying “lost” or failed disks. IPv6 & DHCP support. Network Name resource now uses DNS instead of WINS. Network Name resource more resilient. Loss of an IP address
need not bring Network Name resource offline. Geo-clustering…! a.k.a. cross-subnet clustering. Cluster
communications use TCP unicast and can span subnets.
So, What IS a Cluster?So, What IS a Cluster?
So, What IS a Cluster?So, What IS a Cluster?
Quorum Drive & Storage for Hyper-V
VMs
Cluster Quorum ModelsCluster Quorum Models Ever been to a Kiwanis meeting…? A cluster “exists” because it has quorum between its
members. That quorum is achieved through a voting process.– Different Kiwanis clubs have different rules for quorum.
– Different clusters have different rules for quorum.
If a cluster “loses quorum”, the entire cluster shuts down and ceases to exist. This happens until quorum is regained.– This is much different than a resource failover, which is the
reason why clusters are implemented.
Multiple quorum models exist, for different reasons.
Node & Disk MajorityNode & Disk Majority Node majority eliminates Win2003’s Quorum disk as a
point of failure. Works on a “voting system”.– A two-node cluster gets three votes.
– One for each node and one for the quorum.
– Two votes are needed for quorum.
Because of this model, the loss of the quorum disk only results in the loss of one vote.
Used when an evennumber of nodes arein the cluster.
Most-deployedmodel in production.
Node MajorityNode Majority Only shared storage devices get votes, replicated
storage does not. Requires 3+ votes, so need a minimum of three
members. Used when the number of
cluster nodes is odd. Can use replicated storage
instead of shared storage. Handy for stretch clusters.
File Share Witness ModelFile Share Witness Model Clustering without the nasty (expensive) shared storage!
– (Sort of…OK…not really…)
One file server can serve as witness for multiple clusters. Can be used for non-
production Hyper-Vclusters.(eval/demo only)
Most flexible model forstretch clusters.Eliminates issues ofcomplete site outage.
Witness Disk ModelWitness Disk Model Nodes get no votes. Only the quorum. Cluster remains up as long as one node can talk
to the witness disk. Effectively the same as legacy model. Bad. SPOF. Don’t use.
4 Steps to Cluster!4 Steps to Cluster! Step 1: Configure shared storage.
– Hardware SAN
– Software SAN a la StarWind iSCSI Target Software
Step 2: Attach Hyper-V Hosts to the iSCSI Target Step 3: Configure Windows Failover Clustering Step 4: Configure Hyper-V
Part IIIPart III-VIDEO--VIDEO-Building a Two-Node Hyper-V Cluster Building a Two-Node Hyper-V Cluster with iSCSI Storagewith iSCSI Storage
Part IVPart IVWalking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster
Cluster Shared VolumesCluster Shared Volumes Hyper-V v.1 required a single VM/LUN. v.1’s clustering underpinnings weren’t aware of the
files on a LUN. The “disk” was the cluster resource to failover.– Remember that only one node at a time can own a
resource.
v.2 adds cluster-awareness to individual volumes. This means that individual files on a LUN can be
owned by different hosts. Hosts respect the ownership of each other.
Cluster Shared VolumesCluster Shared Volumes Because NTFS is still the file system, this means
creating a meta-system of ownership information.
Each cluster node checks for ownership, respects the ownership of others, and updates info when they take over ownership.
Designed for use only by Hyper-V’s tiny number of files.
Going Beyond Two NodesGoing Beyond Two Nodes Windows Failover Clustering gets non-linearly more
complex as you add more hosts.– Complexity arrives in failover options.
Some critical best practices:– Manage Preferred Owners & Persistent Mode options correctly.
– Consider carefully the effects of Failback.
– Resist creating hybrid clusters that support other services.
– Integrate SCVMM for dramatically improved management
– Use disk “dependencies”as Affinity/Anti-Affinityrules.
– Add servers in pairs.
– Segregate traffic!!!
Best Practices in Network SegregationBest Practices in Network Segregation
Best Practices in Network SegregationBest Practices in Network Segregation
-DEMO--DEMO-Walking through the Management of Walking through the Management of a Hyper-V Clustera Hyper-V Cluster
30
Part VPart VAdding Disaster Recovery with Multi-Adding Disaster Recovery with Multi-Site ClusteringSite Clustering
What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?
– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.
– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.
– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.
– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.
What Makes a Disaster?What Makes a Disaster? Which of the following would you consider a disaster?
– A naturally-occurring event, such as a tornado, flood, or hurricane, impacts your datacenter and causes damage. That damage causes the entire processing of that datacenter to cease.
– A widespread incident, such as a water leakage or long-term power outage, that interrupts the functionality of your datacenter for an extended period of time.
– A problem with a virtual host creates a “blue screen of death”, immediately ceasing all processing on that server.
– An administrator installs a piece of code that causes problems with a service, shutting down that service and preventing some action from occurring on the server.
– An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down.
DIS
AS
TER
!JU
ST A
BA
D
DA
Y!
What Makes a Disaster?What Makes a Disaster? Your business’ decision to “declare a disaster”
and move to “disaster operations” is a major one.
The technologies that are used for disaster protection are different than those used for HA.– More complex. More expensive.
Failover and failback processes involve more thought.
What Makes a Disaster?What Makes a Disaster? At a very high level, disaster recovery for virtual
environments is three things:
–A storage mechanism
–A replication mechanism
–A set of target servers to receive virtual machines and their data
What Makes a Disaster?What Makes a Disaster?
PrimaryHyper-V Server
PrimaryHyper-V Server
iSCSI Storage Device iSCSI Storage Device
BackupHyper-V Server
BackupHyper-V Server
Backup Site
Storage Device(s)
Replication Mechanism
Target Servers
Storage DeviceStorage Device Typically, two SANs in two different locations
– Fibre Channel or iSCSI– Usually similar model or manufacturer. This is often
necessary for replication mechanism to function property.
Backup SAN doesn’t necessarily need to be of the same size as the primary SAN– Replicated data isn’t always full set of data.
Replication MechanismReplication Mechanism Replication between SANs can occur…
Synchronously– Changes are made on one node at a time. Subsequent
changes on primary SAN must wait for ACK from backup SAN.
Asynchronously– Changes on backup SAN will eventually be written. Are
queued at primary SAN to be transferred at intervals.
Replication MechanismReplication Mechanism Synchronously
– Changes are made on one node at a time. Subsequent changes on primary SAN must wait for ACK from backup SAN.
iSCSI Storage DevicePrimary Site
iSCSI Storage DeviceBackup Site
Change Committed at Primary Site
Change Replicated to Secondary Site
Change Committed at Secondary Site
Acknowledge of Change Returned to
Primary Site
Change Complete
Replication MechanismReplication Mechanism Asynchronously
– Changes on backup SAN will eventually be written. Are queued at primary SAN to be transferred at intervals.
iSCSI Storage DevicePrimary Site
iSCSI Storage DeviceBackup Site
Change 1 Committed at Primary Site
Change 2 Committed at Primary Site
Change 3 Committed at Primary Site
Changes Replicated to Secondary Site
Change 4 Committed at Primary Site
Replication MechanismReplication Mechanism Which Should You Choose…?
Synchronous– Assures no loss of data.
– Requires a high-bandwidth and low-latency connection.
– Write and acknowledgement latencies impact performance.
– Requires shorter distances between storage devices.
Asynchronous– Potential for loss of data during a failure.
– Leverages smaller-bandwidth connections, more tolerant of latency.
– No performance impact.
– Potential to stretch across longer distances.
Your Recovery Point Objective makes this decision…
Replication MechanismReplication Mechanism Replication processing can occur…
Storage Layer– Replication processing is handled by the SAN itself. Often agents
are installed to virtual hosts or machines to ensure crash consistency.
– Easier to set up, fewer moving parts. More scalable. Concerns about crash consistency.
OS / Application Layer– Replication processing is handled by software in the VM OS. This
software also operates as the agent.– More challenging to set up, more moving parts. More installations
to manage/monitor. Scalability and cost are linear. Fewer concerns about crash consistency.
The Problem with Transactional The Problem with Transactional DatabasesDatabases
O/S Crash Consistency is easy to obtain.– Just quiesce the file system before beginning the
replication.
Application Crash Consistency much harder.– Transactional databases like AD, Exchange, SQL
don’t quiesce when the file system does.
– Need to stop these databases before quiescence.
– Or, need an agent in the VM that handles DB quiescence.
Replication without crash consistency will lose data.
DB comes back in “inconsistent” state.
Four-Step Process for VSSFour-Step Process for VSS
Step 1: A requestor, such as replication software, requests the server to invoke a shadow copy.
Step 2: A provider accepts the request and calls an application-specific provider (SQL, Exchange, etc.) if necessary.
Step 3: Application-specific provider coordinates system shadow copy with app quiescence to ensure application consistency.
Step 4: Shadow copy is created.
…then the replication can start…
44
Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup
site. With Hyper-V these servers are part of a Multi-
Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site
cluster, except that it expands over multiple sites.
– Some changes to management and configuration tactics required.
Target Servers & ClusterTarget Servers & Cluster Finally is a set of target servers in the backup
site. With Hyper-V these servers are part of a Multi-
Site Hyper-V cluster.– A multi-site cluster is the exact same thing as a single-site
cluster, except that it expands over multiple sites.
– Some changes to management and configuration tactics required.Hyper-V Server
Hyper-V Server
iSCSIStorage
iSCSIStorage
Backup Site
NetworkSwitch
NetworkSwitch
NetworkSwitch
NetworkSwitch
Multi-Site Cluster TacticsMulti-Site Cluster Tactics Install servers to sites so that your primary site
always contains more servers than backup sites.– Eliminates some problems with quorum during site
outage.
Multi-Site Cluster TacticsMulti-Site Cluster Tactics Leverage Node and File Share Quorum when
possible.– Prevents entire-site outage from impacting quorum.
– Enables creation of multiple clusters if necessary.
Hyper-V ServerHyper-V Server
iSCSIStorage
iSCSIStorage
Backup Site
NetworkSwitch
NetworkSwitch
NetworkSwitch
NetworkSwitch
Witness Server
Witness Site
Third Site for Witness Server
Multi-Site Cluster TacticsMulti-Site Cluster Tactics Ensure that networking remains available when
VMs migrate from primary to backup site.– R2 clustering can now span subnets.
This seems like a good thing, but only if you plan correctly for it.– Remember that crossing subnets also means changing IP
address, subnet mask, gateway, etc, at new site.– This can be automatically done by using DHCP and dynamic
DNS, or must be manually updated.– DNS replication is also a problem. Clients will require time
to update their local cache.– Consider reducing DNS TTL or clearing client cache.
This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it
within your own organization however you like.
For more information on our company, including information on private classes and upcoming conference appearances, please
visit our Web site, www.ConcentratedTech.com.
For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg
This work is copyright ©Concentrated Technology, LLC