The Shortcut Guide to Architect Ing iSCSI Storage for Microsoft Hyper-V

The Shortcut Guide ToThe Shortcut Guide Totmtm

Architecting iSCSI Storage for Microsoft Hyper-V

Greg Shields

The Shortcut Guide to Architecting iSCSI Storage for Microsoft Hyper-V Greg Shields

i

Introduction to Realtime Publishers by Don Jones, Series Editor For several years now, Realtime has produced dozens and dozens of high‐quality books that just happen to be delivered in electronic format—at no cost to you, the reader. We’ve made this unique publishing model work through the generous support and cooperation of our sponsors, who agree to bear each book’s production expenses for the benefit of our readers.

Although we’ve always offered our publications to you for free, don’t think for a moment that quality is anything less than our top priority. My job is to make sure that our books are as good as—and in most cases better than—any printed book that would cost you $40 or more. Our electronic publishing model offers several advantages over printed books: You receive chapters literally as fast as our authors produce them (hence the “realtime” aspect of our model), and we can update chapters to reflect the latest changes in technology.

I want to point out that our books are by no means paid advertisements or white papers. We’re an independent publishing company, and an important aspect of my job is to make sure that our authors are free to voice their expertise and opinions without reservation or restriction. We maintain complete editorial control of our publications, and I’m proud that we’ve produced so many quality books over the past years.

I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if you’ve received this publication from a friend or colleague. We have a wide variety of additional books on a range of topics, and you’re sure to find something that’s of interest to you—and it won’t cost you a thing. We hope you’ll continue to come to Realtime for your

far into the future. educational needs

enjoy. Until then,

Don Jones

http://nexus.realtimepublishers.com/


ii

Introduction to Realtime Publishers ................................................................................................................. i

Ch

apter 1: The Power of iSCSI in Microsoft Virtualization .................................................................... 1

The Goal for SAN Availability Is “No Nines” ............................................................................................. 2

Hy per‐V Is Exceptionally Dependant on Storage ................................................................................... 3

VHD Attachment to VM ................................................................................................................................ 4

Pass‐Through Disks ....................................................................................................................................... 6

iSCSI Direct Attachment ............................................................................................................................... 7

VM‐to‐VM Clustering ..................................................................................................................................... 9

Host Boot from SAN ....................................................................................................................................... 9

Guest Boot from SAN ..................................................................................................................................... 9

VM Performance Depends on Storage Performance ......................................................................... 10

Network Contention ................................................................................................................................... 11

Connection Redundancy & Aggregation ............................................................................................ 12

Type and Rotation Speed of Drives ...................................................................................................... 12

Spindle Contention ...................................................................................................................................... 13

Connection Medium & Administrative Complexity ...................................................................... 13

iSCSI Makes Sense for Hyper‐V Environments .................................................................................... 15

Ch apter 2: Creating Highly‐Available Hyper‐V with iSCSI Storage .................................................. 16

Th e Windows iSCSI Initiator: A Primer ................................................................................................... 17

NIC Teaming ................................................................................................................................................... 19

MCS .................................................................................................................................................................... 20

MPIO .................................................................................................................................................................. 22

Which Option Should You Choose? ...................................................................................................... 25

Ge tting to High Availability with Hyper‐V ............................................................................................. 26

Single Server, Redundant Connections .............................................................................................. 27

Single Server, Redundant Path............................................................................................................... 27

Hyper‐V Cluster, Minimal Configuration ........................................................................................... 29


iii

Hyper‐V Cluster, Redundant Connections ........................................................................................ 29

Hyper‐V Cluster, Redundant Path ........................................................................................................ 30

High Availability Scales with Your Pocketbook ................................................................................... 31

Ch apter 3: Critical Storage Capabilities for Highly‐Available Hyper‐V .......................................... 32

Vir tual Success Is Highly Dependent on Storage ................................................................................ 33

Modular Node Architecture .................................................................................................................... 34

Redundant Storage Processors Per Node ......................................................................................... 36

Redundant Network Connections and Paths ................................................................................... 36

Disk‐to‐Disk RAID ........................................................................................................................................ 37

Node‐to‐Node RAID .................................................................................................................................... 38

Integrated Offsite Replication for Disaster Recovery .................................................................. 40

No n‐Interruptive Capacity for Administrative Actions ................................................................... 41

Volume Activities ......................................................................................................................................... 42

Storage Node Activities ............................................................................................................................. 43

Data Activities ............................................................................................................................................... 43

Firmware Activities .................................................................................................................................... 44

Sto rage Virtualization ..................................................................................................................................... 44

Snapshotting and Cloning ........................................................................................................................ 44

Backup and Restore with VSS Integration ........................................................................................ 45

Volume Rollback .......................................................................................................................................... 45

Thin Provisioning ........................................................................................................................................ 45

Storage Architecture and Management Is Key to Hyper‐V ............................................................. 46

Ch apter 4: The Role of Storage in Hyper‐V Disaster Recovery .......................................................... 47

Defining “Disaster” ........................................................................................................................................... 47

Defining “Recovery”......................................................................................................................................... 49

Th e Importance of Replication, Synchronous and Asynchronous .............................................. 50

Synchronous Replication .......................................................................................................................... 50


iv

Asynchronous Replication ....................................................................................................................... 51

W

hich Should You Choose? ..................................................................................................................... 52

Recovery Point Objective .................................................................................................................... 53

Distance Between Sites ........................................................................................................................ 53

Ensuring Data Consistency ........................................................................................................................... 54

Architecting Disaster Recovery for Hyper‐V ........................................................................................ 56

Ch oosing the Right Quorum ......................................................................................................................... 58

Node and Disk Majority............................................................................................................................. 58

Disk Only Majority ....................................................................................................................................... 58

Node Majority ................................................................................................................................................ 59

Node and File Share Majority ................................................................................................................. 59

Ensuring Network Connectivity and Resolution ................................................................................ 61

Disaster Recovery Is Finally Possible with Hyper‐V Virtualization ........................................... 61


v

Copyright Statement © 2010 Realtime Publishers. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtime Publishers (the “Materials”) and this site and any such Materials are protected by international copyright and trademark laws.

THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtime Publishers its web site sponsors. In no event shall Realtime Publishers or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials.

The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, non-commercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice.

The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties.

Realtime Publishers and the Realtime Publishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners.

If you have any questions about these terms, or if you would like information about licensing materials from Realtime Publishers, please contact us via e-mail at [email protected].

mailto:[email protected]


1

[Editor's Note: This eBook was downloaded from Realtime Nexus—The Digital Library for IT Professionals. All leading technology eBooks and guides from Realtime Publishers can be found at ttp://nexus.realtimepublishers.comh .]

Chapter 1: The Power of iSCSI in Microsoft Virtualization

Virtualization is one of the hottest technologies to hit IT in years, with Microsoft’s Hyper‐V R2 release igniting those flames even further. Hyper‐V arrives as a cost‐effective virtualization solution that can be easily implemented by even the newest of technology generalists.

But while Hyper‐V itself is a trivial implementation, ensuring its highest levels of redundancy, availability, and most importantly performance are not. Due to virtualization’s heavy reliance on storage, two of the most critical decisions you will make in implementing Hyper‐V are where and how you’ll store your virtual machines (VMs).

Virtualization solutions such as Hyper‐V enable many fantastic optimizations for the IT environment: VMs can be easily backed up and restored in whole, making affordable server restoration and disaster recovery possible. VM processing can be load balanced across any number of hosts, ensuring that you’re getting the most value out of your server hardware dollars. VMs themselves can be rapidly deployed, snapshotted, and reconfigured as needed, to gain levels of operational agility never before seen in IT.

Yet at the same time virtualization also adds levels of complexity to the IT environment. Gone are the traditional notions of the physical server “chassis” and its independent connections to networks and storage. Replacing this old mindset are new approaches that leverage the network itself as the transmission medium for storage. With the entry of enterprise‐worthy iSCSI solutions into the market, IT environments of all sizes can leverage the very same network infrastructure they’ve built over time to host their storage as well. This already‐present network pervasiveness combined with the dynamic nature of virtualization makes iSCSI a perfect fit for your storage needs.

Correctly connecting all the pieces, however, is the challenge. To help, this guide digs deep into the decisions that environments large and small must consider. It looks at best practices for Hyper‐V storage topologies and technologies, as well as cost and manageability implications for the solutions available on the market today. Both this and the following chapter will start by discussing the technical architectures required to create a highly‐available Hyper‐V infrastructure. In Chapter 2, you’ll be impressed to discover just how many ways that redundancy can be inexpensively added to a Hyper‐V environment using native tools alone.

http://nexus.realtimepublishers.com/


2

If, like many, your storage experience is thus far limited to the disks you plug directly into your servers, you’ll be surprised at the capabilities today’s iSCSI solutions offer. Whereas Chapters 1 and 2 deal with the interconnections between server and storage, Chapter 3 focuses exclusively on capabilities within the storage itself. Supporting features such as automatic restriping, thin provisioning, and built‐in replication, today’s iSCSI storage provides enterprise features in a low‐cost form factor.

Finally, no storage discussion is fully complete without a look at the affordable disaster recovery options made available by virtualizing. Chapter 4 discusses how iSCSI’s backup, replication, and restore capabilities make disaster recovery solutions (and not just plans) a real possibility for everyone.

But before we delve into those topics, we first need to start with your SAN architecture itself. That architecture can arguably be the center of your entire IT infrastructure.

The Goal for SAN Availability Is “No Nines” It has been stated in the industry that “The goal for SAN availability is ‘no nines’ or 100% availability.” This is absolutely true in environments where data loss or non‐availability have a recognizable impact on the bottom line. If your business loses thousands of dollars for every second its data is not available, you’d better have a storage system that never, ever goes down.

While such a goal could be laughable if it were applied to general‐purpose operating systems (OSs) such as Microsoft Windows, 100% availability is not unheard of in the specialized hardware solutions that comprise today’s SANs. No matter which company builds your SAN, nor which medium it uses to transfer data, its single‐purpose mission means that multiple layers of redundancy can be built‐in to its hardware:

• Multiple power supplies means that no single cable or power input loss can cause a failure.

• Multiple and redundant connections between servers and storage ensure that a connection loss can be survived.

• Redundant pathing through completely‐isolated equipment further protects connection loss by providing an entirely separate path in the case of a downstream failure.

• RAID configurations ensure that the loss of a single disk drive will not cause the loss of an entire volume of data.

• Advanced RAID configurations further protect against drive loss by ensuring that even multiple, simultaneous drive failures will not impact availability.

• Data striping across storage nodes creates the ultimate protection by preserving availability even after the complete loss of SAN hardware.


3

All these redundant technologies are laid in place because a business’ data is its most critical asset. Whether that data is contained within Microsoft Office documents or high‐performance databases, any loss of data is fundamentally critical to a business’ operations.

Yet a business’ data is only one facet of the IT environment. That data is useless without the applications that work with it and create meaning out of its bits and bytes. In a traditional IT environment, those applications run atop individual physical servers, with OSs and applications often installed to their own local direct‐attached storage. While your applications’ data might sit within a highly‐available SAN, the thousands of files that comprise each OS and its applications usually remain local.

With virtualization, everything changes. Moving that same environment to virtualization encapsulates each server’s OS and its applications into a virtual disk. That virtual disk is then stored within the very same SAN infrastructure as your business data. As a result, making the move to virtualization effectively elevates your run‐of‐the‐mill OS and application files to the same criticality as your business data.

Hyper‐V Is Exceptionally Dependant on Storage Let’s take a look at the multiple ways in which this new criticality occurs. Figure 1.1 depicts an extremely simplistic representation of a two‐node Hyper‐V cluster. In this cluster, each server connects via one or more interfaces to the environment’s network infrastructure. Through that network, the VMs atop each server communicate with clients to provide their assigned services.

Important to recognize here is that high‐availability in Hyper‐V—like most virtualization platforms—requires some form of shared storage to exist across every host in the cluster.


4

Figure 1.1: Highlyavailable HyperV at its simplest.

That shared storage is the location where Hyper‐V’s VMs reside. This is the case because today’s VM high‐availability technologies never actually move the VM’s disk file. Whether the transfer of ownership between two hosts occurs as a live migration with a running VM or a re‐hosting after a physical host failure, the high‐availability relocation of a VM only moves the processing and not the storage of that VM.

It is for this reason the storage component of any Hyper‐V cluster is its most critical element. Every VM sits within that storage, every Hyper‐V host connects to it, and all the processing of your data center’s applications and data are now centralized onto that single device.

Yet this is only the simplest of ways in which a Hyper‐V cluster interacts with its storage. Remember that iSCSI is in effect an encapsulation of traditional SCSI commands into network‐routable packets. This encapsulation means that wherever your network exists, so can your storage. As a result, there are a number of additional ways in which virtual hosts and machines can connect to their needed storage. Let’s take a look through a few that relate specifically to Hyper‐V’s VMs. You’ll find that not all options for connecting VMs to storage are created alike.

VHD Attachment to VM Creating a new VM requires assigning its needed resources. Those resources include one or more virtual processors, a quantity of RAM, any peripheral connections, and the disk files that contain its data. Any created VM requires at a minimum a single virtual hard disk (VHD) to become its storage location.


5

Although a single VHD is the minimum, it is possible to attach additional VHDs to a VM either during its creation or at any point thereafter. Each newly‐attached VHD becomes yet another drive on the VM. Figure 1.2 shows how a second VHD, stored at G:\Second Virtual Hard Disk.vhd, has been connected to the VM named \\vm1.

Figure 1.2: Attaching a second VHD to an existing VM.

Attached VHDs are useful because they retain the encapsulation of system files into their single .VHD file. This encapsulation makes them portable, enabling them to be disconnected from one VM and attached to another at any point. As VHDs, they can also be backed up as a single file using backup software that is installed to the Hyper‐V host, making their single‐file restore possible.

However, VHDs can be problematic when backup software requires direct access to disks for proper backups or individual file and folder restores. Also, some applications require an in‐band and unfiltered SCSI connection to connected disks. These applications, while rare, will not work with attached VHD files. Lastly, VHDs can only be connected or disconnected when VMs are powered down, forcing any change to involve a period of downtime to the server.

VHDs can be created with a pre‐allocated fixed size or can be configured to dynamically expand as data is added to the VM. All VHDs are limited to 2040GB (or just shy of 2TB) in size. Dynamically expanding VMs obviously reduces the initial amount of disk space consumed by the freshly‐created VM. However, care must be taken when collocating multiple dynamically‐expanding VHD files on a single volume, as the combination of each VHD’s configured maximum size will often be greater than the maximum size of the volume itself. Proactive monitoring must be laid into place to watch for and alert on growth in the size of storage when dynamically‐expanding VHDs are used.


6

The level of expected performance between fixed and dynamic VHD files is only slightly different when using Hyper‐V R2, with fixed disks seeing a slightly increased level of performance over those created as dynamic. Dynamic VHD files incur an overhead during write operations that expand the VHD’s size, causing a slight reduction in performance over fixed disks. Microsoft testing suggests that fixed VHDs see performance that is equal to native disk performance when run atop Hyper‐V R2. Dynamic disks experience between 85% and 94% of native performance, depending on the type of write operations being done within the VM.

Your decision about whether to use fixed versus dynamic VHDs will depend on your need for slightly better performance versus your available quantity of storage. Consumed storage, however, does represent a cost. As you’ll discover in Chapter 3, the capability for thin‐provisioning VM storage often outweighs any slight improvements in performance.

Pass‐Through Disks An alternative approach to pulling extra disks into a VM is through the creation of a pass‐through disk. With this approach, an iSCSI disk is exposed to the Hyper‐V host and then passed through from the host to a residing VM. By passing through the disk rather than encapsulating it into a VHD, its contents remain in their native format. This allows certain types of backup and other software to maintain direct access to the disk using native SCSI commands. As essentially raw mappings, pass‐through disks also eliminate the 2040GB

ses. size limitation of VHDs, which can be a problem for very large file stores or databa

Microsoft suggests that pass‐through disks achieve levels of performance that are equivalent to connected VHD files. Pass‐through disks can also be leveraged in clustered Hyper‐V scenarios by creating the disk as a clustered resource after assigning it to a VM.

Figure 1.3 shows how a pass‐through disk is created between a host and its residing VM. Here, as in the previous example, pass‐through disks can only be attached to VMs that have been powered off. In this image, the host’s Disk Management wizard has been displayed on the left with a 256MB offline disk attached via iSCSI and initialized by the host. Once initialized, the disk is taken offline and made available to the VM through its settings wizard on the right. There, the VM’s second disk drive is attached to the passed‐through hard drive, which is labeled Disk 4 0.25 GB Bus 0 Lun 0 Target 3.


7

Figure 1.3: Creating a passthrough disk.

Because they are not encapsulated into VHDs, pass‐through disks cannot be snapshotted by Hyper‐V. However, because the files reside on‐disk in a native format, your storage solution may be able to complete the snapshot from its own perspective. This storage‐level snapshot can enable advanced storage‐level management functions such as replication, backup and restore, and volume‐level cloning.

iSCSI Direct Attachment Pass‐through disks can be an obvious choice when applications require that direct mapping. Yet creating pass‐through disks adds a layer of complexity that needn’t be present when there aren’t specific application requirements. A third option that makes sense for most environments is the direct attachment of iSCSI‐based volumes right into the VM. This process uses the VM’s iSCSI Initiator to create and manage connections to iSCSI disks.

Note Because direct attachment uses the VM’s iSCSI Initiator, this process only works when used with an iSCSI SAN. Environments that use Fibre Channel SANs cannot recognize this benefit and must resort to using pass‐through disks.


8

Figure 1.4 shows how the iSCSI Initiator for VM \\vm1 is instead configured to connect directly to the previous example’s 256MB disk. This connection is possible because of iSCSI’s network pervasiveness. Further, the iSCSI Initiator runs as its own service that is independent of the virtualization infrastructure, with the VM’s connection to its iSCSI disk being completely isolated from the host.

Figure 1.4: Connecting directly to an iSCSI LUN from within a VM.

iSCSI direct attachment enables the highest levels of portability for network‐attached disks, retaining all the desired capabilities of the previous examples but without their limitations. Disks can be connected and disconnected at will without the need to reboot the VM. As with VHDs, disks from one VM can be easily attached to an alternate should the need arise; and similar to pass‐through disks, data that is contained within the disk remains in its native format.


9

SAN Backups and VM Resources When considering the use of a SAN for a virtualized environment, pay special attention to its backup features. One very valuable feature is the ability to directly back up disks without the need for backup agents within the VM. VM‐installed agents tend to consume large levels of resources during the backup process, which can have a negative impact on the virtual environment’s overall performance. By backing up SAN data directly from the SAN, VMs needn’t be impacted by backup operations. This capability represents another benefit to the use of pass‐through or direct‐attached iSCSI disks.

VM‐to‐VM Clustering Yet another capability that can be used by environments with iSCSI SANs is the creation of clusters between VMs. This kind of clustering layers over the top of the clusters used by Hyper‐V hosts to ensure the high availability of VMs.

Consider the situation where a critical network resource requires the highest levels of availability in your environment. You may desire that resource to run atop a VM to gain the intrinsic benefits associated with virtualization, but you also want to ensure that the resource itself is clustered to maintain its availability during a VM outage. Even VMs must be rebooted from time to time due to monthly patching operations, so this is not an uncommon requirement. In this case, creating a VM‐to‐VM cluster for that network resource will provide the needed resiliency.

VM‐to‐VM clusters require the same kinds of shared storage as do Hyper‐V host‐to‐host clusters. Due to the limitations of the types of storage that can be attached to a VM, that storage can only be created using iSCSI direct connection. Neither VHD attachment nor pass‐through disks can provide the necessary shared storage required by the cluster. In this architecture, a SAN disk is exposed and connected to both VMs via direct connection. The result is a network resource that can survive the loss of both a Hyper‐V host as well as the loss of a VM.

Host Boot from SAN With SAN storage becoming so resilient that there is no longer any concern of failure, it becomes possible to move all data off your server’s local disks. Eliminating the local disks from servers accomplishes two things: It eliminates the distribution of storage throughout your environment, centralizing everything into a single, manageable SAN solution. Second, it abstracts the servers themselves, enabling a failed server to be quickly replaced by a functioning one. Each server’s disk drives are actually part of the SAN, so replacing a server is an exceptionally trivial process.

Guest Boot from SAN A final solution that can assist with the rapid provisioning of VMs is booting hosted VMs themselves from the SAN. Here, SAN disks are exposed directly to VMs via iSCSI, enabling them to boot directly from the exposed disk. This final configuration is not natively available in Windows Server 2008 R2, and as such requires a third‐party solution.


10

VM Performance Depends on Storage Performance As you can see, in all of these architectures, the general trend is towards centralizing storage within the SAN infrastructure. By consolidating your storage into that single location, it is possible to perform some very useful management actions. Storage can be backed up with much less impact on server and VM processing. It can be replicated to alternate or offsite locations for archival or disaster recovery. It can be deduplicated, compressed, thin provisioned, or otherwise deployed with a higher expectation of utilization. In essence, while SAN storage for Hyper‐V might be more expensive than local storage, you should expect to use it more efficiently.

Chapter 3 will focus in greater detail on those specific capabilities to watch for. Yet there is another key factor associated with the centralization of storage that must be discussed here. That factor relates to storage performance.

It has already been said that the introduction of virtualization into an IT environment brings with it added complexities. These complexities arrive due to how virtualization adds layers of abstraction over traditional physical resources. That layer of abstraction is what makes VMs so flexible in their operations: They’re portable, they can be rapidly deployed, they’re easily restorable, and so on.

Yet that layer of abstraction also masks some of those resources’ underlying activities. For example, a virtual network card problem can occur because there is not enough processing power. A reduction in disk performance can be related to network congestion. An entire‐system slowdown can be traced back to spindle contention within the storage array. In any of these situations, the effective performance of the virtual environment can be impacted by seemingly unrelated elements.

Figure 1.5 shows how Hyper‐V’s reliance on multiple, interconnected elements creates multiple points in which bottlenecking can occur. For example, network contention can reduce the amount of bandwidth that is available for passing storage traffic. The type and speed of drives in the storage array can impact their availability. Even the connection medium itself—copper versus fibre, Cat 5 versus Cat 6a—can impact what resources are available to what servers. Smart Hyper‐V administrators must always be aware of and compensate for bottlenecks like these in the architecture. Without digging too deep into their technical details, let’s take a look at a few that can be common in a Hyper‐V architecture.


11

Figure 1.5: Virtual environments have multiple areas where performance can

bottleneck.

Network Contention Every network connection has a hard limit on the quantity of traffic that can pass along it over a period of time. This maximum throughput is in many environments such a large quantity that monitoring it by individual server is unnecessary. Yet networks that run virtual environments operate much differently than all‐physical ones. Consolidating multiple VMs atop a single host means a higher rate of resource utilization (that same “greater efficiency” that was spoken of earlier). Although this brings greater efficiency to those resources, it also brings greater utilization.

Environments that move to virtualization must take into account the potential for network contention as utilization rates go up. This can be alleviated through the addition of new and fully‐separated network paths, as well as more powerful networking equipment to handle the load. These paths can be as simple as aggregating multiple server NICs together for failover protection, all the way through completely isolated connections through different network equipment. With Hyper‐V’s VMs having a heavy reliance on their storage, distributing the load across multiple paths will become absolutely necessary as the

environment scales.

Another resolution involves modifying TCP parameters for specified connections. Microsoft’s Hyper‐V R2 supports the use of Jumbo Frames, a modification to TCP that enables larger‐sized Ethernet frames to be passed across a network. With a larger quantity of payload data being passed between TCP acknowledgements, the protocol overhead can be reduced by a significant percentage. This results is a performance increase over existing gigabit Ethernet connections.


12

Note Jumbo Frames are not enabled by default on servers, networking equipment, or most SAN storage devices. Consult your manufacturer’s guide for the specific details on how to enable this support. Be aware that Jumbo Frames must be enabled on every interface in each path between servers and storage.

Connection Redundancy & Aggregation Connection redundancy in virtual environments is necessary for two reasons: First, the redundant connection provides an alternate path for data should a failure occur. With external cables connecting servers to storage in iSCSI architectures, the potential for an accidental disconnection is high. For this reason, connection redundancy using MultiPath I/O (MPIO) or Multiple Connected Session (MCS) is strongly suggested. Both protocols are roughly equivalent in terms of effective performance; however, SAN interfaces often support only one of the two options. Support for MPIO is generally more common in today’s SAN hardware.

The second reason redundancy is necessary in virtualized environments is for augmenting bandwidth. iSCSI connections to Hyper‐V servers can be aggregated using MPIO or MCS for the express purpose of increasing the available throughput between server and storage. In fact, Microsoft’s recommendation for iSCSI connections in Hyper‐V environments is to aggregate multiple gigabit Ethernet connections in all environments. Ensure that your chosen SAN storage has the capability of handling this kind of link aggregation across multiple interfaces. Chapter 2 will discuss redundancy options in much greater detail.

Consider 10GbE The 10GbE standard was first published in 2002, with its adoption ramping up only today with the increased network needs of virtualized servers. 10GbE interface cards and drivers are available today by most first‐party server and storage vendors. Be aware that the use of 10GbE between servers and storage requires a path that is fully 10GbE compliant, including all connecting network equipment, cabling, and interfaces. Cost here can be a concern. With 10GbE being a relatively new technology, its cost is substantially more expensive across the board, with 10GbE bandwidth potentially costing more than aggregating multiple 1GbE connections together.

Type and Rotation Speed of Drives The physical drives in the storage system itself can also be a bottleneck. Multiple drive types exist today for providing disk storage to servers: SCSI, SAS, and SATA are all types of server‐quality drives that have at some point been available for SAN storage. Some drives are intended for low‐utilization archival storage, while others are optimized for high‐speed read and write rates. Virtualization environments require high‐performance drives with high I/O rates to ensure good performance of residing VMs.


13

One primary element of drive performance relates to each drive’s rotation speed. Today’s SAN drives tend to support speeds of up to 15,000RPM, with higher‐rotation speeds resulting in greater performance. Studies have shown that rotation speed of storage drives has a greater impact on overall VM performance than slow network conditions or connection maximums. Ensuring that VHD files are stored on high‐performance drives can have a substantial impact on their overall performance.

Spindle Contention Another potential bottleneck that can occur within the storage device itself is an oversubscription of disk resources. Remember that files on a disk are linearly written to the individual platters as required by the OS. Hyper‐V environments tend to leverage large LUNs with multiple VMs hosted on a single LUN. Multiple hosts have access to those VMs via their iSCSI connections, and process their workloads as necessary.

Spindle contention occurs when too much activity is requested for the files on a small area of disk. For example, if the VHD files for two high‐utilization VMs are located near each other on the SAN’s disks. When these two VMs have a high rate of change, they require greater than usual attention by the disk’s spindle as it traverses the platters to read and write data. When the hardware spindle itself cannot keep up with the load placed upon it, the result is a reduction in storage (and, therefore, VM) performance.

This problem can be easily resolved by re‐locating some of the data to another position in the SAN array. Yet, protecting against spindle contention is not an activity that can be easily accomplished by an administrator. There simply aren’t the tools at an administrator’s disposal for identifying where it is and isn’t occurring. Thus, protection against spindle contention is often a task that is automatically handled by the SAN device itself. When considering the purchase of a SAN storage device for Hyper‐V, look for those that have the automated capability to monitor and proactively fix spindle contention issues or that use storage virtualization to abstract the physical location of data. Also, work with any SAN vendor to obtain guidance on how best to architect your SAN infrastructure for the lowest‐possible level of spindle contention.

Connection Medium & Administrative Complexity Lastly is the connection medium itself, with many options available to today’s businesses. The discussion on storage in this guide relates specifically to iSCSI‐based storage for a number of different reasons: Administrative complexity, cost, in‐house experience, and existing infrastructure all factor into ss: the type of SAN that makes most sense for a busine

• Administrative complexity. iSCSI storage arrives as a network‐based wrapper around traditional SCSI commands. This wrapper means that traditional TCP/IP is used as its mechanism for routing. By consolidating storage traffic under the

ayer of protocols need to be tion and storage networking.

umbrella of traditional networking, only a single lmanaged by IT operations to support both produc


14

• Inhouse experience. Fibre Channel‐based SANs tend to require specialized skills to correctly architect the SAN fabric between servers and storage. These skills are not often available in environments who do not have dedicated SAN administrators on‐site. Further, skills in working with traditional copper cabling do not directly translate to those needed for Fibre Channel connections. Thus, additional costs can be required for training.

• Cost. Due to iSCSI’s reliance on traditional networking devices for its routing, there is no need for additional cables and switching infrastructure to pass storage traffic to configured servers. Existing infrastructure components can be leveraged for all manner of iSCSI traffic. Further, when iSCSI traffic grows to the point where expansion is needed, the incremental costs per server are reduced as well. Table 1.1 shows an example breakdown of costs to connect one server to storage via a single connection. Although cables and Fibre Channel switch ports are both slightly higher with Fibre Channel, a major area of cost relates to the specialized Host Bus Adapter (HBA) that is also required. In the case of iSCSI, existing gigabit copper network cards can be used.

Fibre Channel iSCSI

Host Bus Adapter $800 to $2,000 Channel HBA) (Fibre

$100 to $200 Copper NIC) (GbE

Cables $150 $15

Switch Port $500 $80

Connection Cost per Server

$1,450 to $2,650 $195 to $295

Table 1.1: Comparing the cost to connect one server to storage via Fibre Channel versus iSCSI using a single connection.

• Existing infrastructure. Lastly, every IT environment already has a networking infrastructure in place that runs across traditional copper connections. Along with that infrastructure are usually the extra resources necessary to add connectivity such as cables, switch ports, and so forth. These existing resources can be easily repurposed to pass iSCSI traffic over available connections with a high degree of success.


15

iSCSI Makes Sense for Hyper‐V Environments Although useful for environments of all sizes, iSCSI‐based storage is particularly suited for those in small and medium enterprises. These enterprises likely do not have the Fibre Channel investment already in place, yet do have the necessary networking equipment and capacity to pass storage traffic with good performance.

Successfully accomplishing that connection with Hyper‐V, however, is still more than just a “Next, Next, Finish” activity. Connections must be made with the right level of redundancy as well as the right architecture if your Hyper‐V infrastructure is to survive any of a number of potential losses. Chapter 2 will continue this discussion with a look at the various ways to implement iSCSI storage with Hyper‐V. That explanation will show you how to easily add the right levels of redundancy and aggregation to ensure success with your Hyper‐V VMs.


16

Chapter 2: Creating Highly‐Available Hyper‐V with iSCSI Storage

It’s worth saying again that Hyper‐V alone is exceptionally easy to set up. Getting the basics of a Hyper‐V server up and operational is a task that can be completed in a few minutes and with a handful of mouse clicks. But in the same way that building a skyscraper is so much more than welding together a few I‐beams, creating a production‐worthy Hyper‐V infrastructure takes effort and planning to be successful.

The primary reason for this dissonance between “installing Hyper‐V” and “making it ready for operations” has to do with high availability. You can think of a Hyper‐V virtual infrastructure in many ways like the physical servers that exist in your data center. Those servers have high‐availability functions built‐in to their hardware: RAID for drive redundancy, multiple power supplies, redundant network connections, and so on. Although each of these is a physical construct on a physical server, they represent the same kinds of things that must be replicated into the virtual environment: redundancy in networking through multiple connections and/or paths, redundancy in storage through multipathing technology, redundancy in processing through Live Migration, and so on.

Using iSCSI as the medium of choice for connecting servers to storage is fundamentally useful because of how it aggregates “storage” beneath an existing “network” framework. Thus, with iSCSI it is possible to use your existing network infrastructure as the transmission medium for storage traffic, all without needing a substantially new or different investment in infrastructure.

To get there, however, requires a few new approaches in how servers connect to that network. Hyper‐V servers, particularly those in clustered environments, tend to make use of a far, far greater number of network connections than any other server in your environment. With interfaces needed for everything from production networking to

. storage networking to cluster heartbeat, keeping straight each connection is a big task

This chapter will discuss some of the best practices in which to connect those servers properly. It starts with a primer on the use of the iSCSI Initiator that is natively available in Windows Server 2008 R2. You must develop a comfort level with this management tool to be successful, as Hyper‐V’s high level of redundancy requirements means that you’ll likely be using every part of its many wizards and tabs. In this section, you’ll learn about the multiple ways in which connections are aggregated for redundancy. With this foundation established, this chapter will continue with a look at how and where connections should be ggregated in single and clustered Hyper‐V environments. a


17

The Windows iSCSI Initiator: A Primer Every iSCSI connection requires two partners. On the server is an iSCSI initiator. This initiator connects to one or more iSCSI targets that are located on a storage device elsewhere on the network. In order to create this connection, software is required at both ends. At the target is software that handles incoming connections, directs incoming initiator traffic to the right LUN, and ensures that security is upheld between LUNs and the initiators to which they are exposed.

Each computer connecting to that iSCSI storage device needs its own set of initiator software to manage its half of the connection. Native to Windows Server 2008 R2 (as well as previous editions, although we will not discuss them in this guide) is the Windows iSCSI Initiator. Its functionality is accessed through a link of the same name that is found under Administrative Tools. Figure 2.1 shows an example of the iSCSI Initiator Control Panel as seen in Windows Server 2008 R2. Here, three separate LUNs have been created using the iSCSI target device’s management toolset and exposed to the server.

Figure 2.1: The default iSCSI

Initiator.


18

Note Storage connections are generally made first using the storage device’s management toolset. They are exposed via a fully‐qualified domain name or IP address to one or more servers that may or may not share concurrent access.

The process to accomplish this first step is different based on the storage device used. Each storage device vendor develops their own toolset for accomplishing this task, with some leveraging Web‐based utilities while others use client‐based utilities. Consult your vendor’s administrative guide for details on this process.

In the simplest of configurations, connecting an iSCSI target to an initiator is an exceptionally easy process. First, enter the target DNS name or IP address into the box labeled Target, and click Quick Connect. If settings have been correctly configured at the iSCSI SAN, with LUNs properly exposed to the server, a set of targets should appear in the box below. Selecting each target and clicking Connect will enable the connection.

At this point, the disk associated with that target’s LUN will be made available within the server’s Disk Management console. There, it will need to be brought online, initialized, and formatted to make it useable by the system.

From an architectural standpoint, a number of on‐system components must work in concert for this connection to occur. As Figure 2.2 shows, the operating system (OS) and its applications leverage the use of a vendor‐supplied disk driver to access SCSI‐based disks. The SCSI layer in turn wraps its commands within iSCSI network traffic through the iSCSI Initiator, which itself resides atop the server’s TCP/IP communication. Incoming traffic arrives via a configured NIC, and is recompiled into SCSI commands, which are then used by the system for interaction with its disks.


19

Figure 2.2: Multiple elements work together to enable an OS to interact with iSCSI

disks.

This overt simplicity in configuring LUNs with a single path belies the added complexity when adding additional connections. You’ll find that once your number of storage connections grows beyond one per server, your number of high‐availability options and configurations within each option grows dramatically. To assist, let’s take a look through the three kinds of high‐availability options that are commonly considered today.

NIC Teaming One option for high availability that is often a first consideration by many administrators is the use of NIC teaming at the server. This option is often first considered because of the familiarity administrators have with NIC teaming over production networks. Using this method, multiple NICs are bonded together through the use of a proprietary NIC driver. In Figure 2.2, this architecture would be represented by adding additional arrows and NIC cards below the element marked TCP/IP.

Although teaming or bonding together NICs for the purpose of creating storage connection redundancy is indeed an option, be aware that this configuration is neither supported nor considered a best practice by either Microsoft or most storage vendors. As such, it is not an

uction deployments. option that most environments should consider for prod


20

Although NIC teaming provides the kind of redundancy that works for traditional network connections, two alternative protocols have been developed that accomplish the same goal but with better results. Multipath Input/Output (MPIO) and Multiple Connections per Session (MPS) are two very different protocols that enable multiple connections between servers and storage and are designed specifically to deal with the needs of network storage traffic.

MCS The first of these protocols is MCS. This protocol operates at the level of the iSCSI initiator (see Figure 2.3) and is a part of the iSCSI protocol itself, defined within its RFC. Its protocol‐specific technology enables multiple, parallel connections between a server and an iSCSI target. As a function of the iSCSI initiator, MCS can be used on any connection once the iSCSI initiator is enabled for use on the system.

NIC

TCP/IP

iSCSI Initiator

SCSI

Disk Driver

Operating System & Apps

NIC

Teamed Connection with MCS

Figure 2.3: Teaming connections with MCS.

To enable MCS for a storage connection, connect first to a target and then click Properties within the Targets tab of the iSCSI Initiator Control Panel. In the resulting screen, click the button marked MCS that is found at the bottom of the wizard. With MCS, multiple initiator IP addresses are connected to associated target portal IP addresses on the storage device (see Figure 2.4). These target portal IP addresses must first be configured on both the

ator. server and the storage device prior to connecting an initi


21

Unlike MPIO, which will be discussed next, MCS does not require any special multipathing technology to be coded by the manufacturer and installed to connecting servers; however, support must be available on the storage device. Consult your manufacturer’s documentation to verify whether MCS support is available within your storage hardware.

Figure 2.4: Configuring MCS for a storage connection.

MCS can be configured with one of five different policies, with each policy determining the behavior of traffic through the connection. Policies with MCS are configured per session and apply to all LUNs that are exposed to that session. As such, individual sessions between initiator and target are given their own policies. The five policies function as follows:

• Fail Over Only. With a failover policy, there is no load balancing of traffic across the session’s multiple connections. One path is used for all communication up until the failure of that path. When the active path fails, traffic is then routed through the standby path. When the active path returns, routing of traffic is returned back to the original path.

• Round Robin. Using Round Robin, all active paths are used for routing traffic. Using this policy, communication is rotated among available paths using a round robin approach.


22

• Round Robin with a subset of paths. This third policy operates much like the Round Robin policy, with one important difference. Here, one or more paths are set aside as standby paths to be used similar to those in the Fail Over Only policy. These paths remain in standby until a primary path failure occurs. At that point, the standby path is used in the Round Robin with the surviving paths. When the failed primary path returns, traffic is routed again through that path, returning the subset path to standby.

• Least Queue Depth. The Least Queue Depth policy functions similarly to Round Robin, with the primary difference being in the determination of how traffic is load balanced across paths. With Least Queue Depth, each request is sent along the path that has the least number of queued requests.

• Weighted Paths. Weighted Paths provides a way to manually configure the weight of each path. Using this policy, each path is assigned a relative weight. Traffic will be balanced across each path based on that assigned weight.

MPIO Another option for connection redundancy is MPIO. This protocol accomplishes the same functional result as MCS but uses a different approach. With MPIO (see Figure 2.5), disk manufacturers must create drivers that are MPIO‐enabled. These disk drivers include a Device‐Specific Module (DSM) that enables the driver to orchestrate requests across multiple paths. The benefit with MPIO is in its positioning above the SCSI layer. There, a single DSM can be used to support multiple network transport protocols such as Fibre hannel and iSCSI. C


23

NIC

TCP/IP

iSCSI Initiator

SCSI

Disk Driver with MPIO DSM

Operating System & Apps

NIC

Teamed Connection with MPIO

Figure 2.5: Teaming connections with MPIO.

Your hardware manufacturer’s DSM must be installed to each server where you intend to use MPIO. Alternatively, a default DSM is available in Windows Server 2008 R2 that functions with many storage devices. Consult your manufacturer’s documentation to verify whether their separate vendor driver installation is required, or if the default Windows DSM is supported.

To use the default DSM with iSCSI storage, two steps are necessary. First, install the Multipath I/O Feature from within Server Manager. Installing this feature requires a reboot and makes available the MPIO Control Panel within Administrative Tools. Step two involves claiming all attached iSCSI devices for use with the default Microsoft DSM. Do this by launching the MPIO Control Panel and navigating to the Discover Multi‐Paths tab. There, select the Add support for iSCSI devices check box and reboot the computer. This process instructs the server to automatically claim all iSCSI devices for the Microsoft DSM, regardless of their vendor or product ID settings.

Once enabled, MPIO is configured through the iSCSI Initiator Control Panel. There, select an existing target and click Connect. In the resulting screen, select the Enable multipath check box and click Advanced. The Advanced settings screen for the connection provides a place where additional initiator IP addresses are connected to target portal IP addresses. Repeating this process for each source and target IP address connection will create the multiple paths used by MPIO.


24

Verifying path creation is accomplished by selecting an existing target and clicking Devices and then MPIO. The resulting screen, seen in Figure 2.6, displays the configured paths from the server to the target. Also in this location is the selection for configuring the load‐balance policy for the LUN.

Figure 2.6: Setting an MPIO loadbalance policy.

MPIO in Windows Server 2008 R2 can use any of the same five policies as MCS as well as one additional policy. Since the DSM operates at the level of the disk driver, it can additionally load balance traffic across routes based on the number of data blocks being processed. This sixth policy, named Least Blocks, will route each subsequent request down the path that has the fewest data blocks being processed.

It is important to note that policies with MPIO are applied to individual devices (LUNs), enabling each connected LUN to be assigned its own policy based on need. This behavior is different than with MCS, where each LUN that is exposed into a single session must share the same policy.

Note Be aware that MPIO and MCS both achieve link redundancy using a protocol that exists above TCP/IP in the network protocol stack, while NIC teaming uses protocols that exist below TCP/IP. For this reason, each individual MPIO or MCS connection requires its own IP address that is managed within the iSCSI Initiator Control Panel. This is different than with NIC teaming, where ports are aggregated via the switching infrastructure and a single IP address is exposed.


25

Which Option Should You Choose? It is commonly considered that MPIO and MCS are relatively similar in their level of performance and overall manageability. Microsoft’s MPIO tends to use fewer processor resources than MCS, particularly under heavy loads; however, MCS tends to have slightly better performance as long as the number of connections per session remains low.

With this in mind, consider the following guidelines when determining which option for storage connection redundancy you should choose for your Hyper‐V environment:

• Traditional NIC teaming is not considered a best practice for storage connections.

• Some storage devices do not support the use of MCS. In these cases, your only option is to use MPIO.

• Use MPIO if you need to support different load‐balancing policies on a per‐LUN basis. This is suggested because MCS can only define policies on a per‐session basis, while MPIO can define policies on a per‐LUN basis.

• Hardware iSCSI HBAs tend to support MPIO over MCS as well as include other . features such as Boot‐from‐iSCSI. When using hardware HBAs, consider using MPIO

• MPIO is not available on Windows XP, Windows Vista, or Windows 7. If you need to create iSCSI direct connections to virtual machines, you must use MCS.

• Although MCS provides a marginally better performance over MPIO, its added processor utilization can have a negative impact in high‐utilization Hyper‐V environments. For this reason, MPIO may be a better selection for these types of environments. Do I Need Hardware iSCSI HBAs? This guide has talked extensively about the use of traditional server NICs as the medium for iSCSI network traffic. However, specialized hardware HBAs for iSCSI traffic exist as add‐ons. These specialized devices are dedicated for use by iSCSI connections and potentially provide a measure of added performance over traditional network cards. As such, you may be asking “Do I need these special cards in my Hyper‐V servers?” Today’s conventional wisdom answers this question with, “perhaps not.” iSCSI network processing represents a relatively small portion of the overall processing of SCSI disk commands in Windows, with the majority of processing occurring in the network stack, kernel, and file system. Windows Server 2008 in cooperation with server‐class NIC vendors now includes support for a number of network optimizations (TCP Chimney, Receive Side Scaling, TCP Checksum Offload, Jumbo Frames) that improve the overall processing of network traffic, and therefore iSCSI processing as well.


26

One traditional configuration where hardware iSCSI HBAs have been necessary was when Boot‐from‐iSCSI was desired. These HBAs have typically included the necessary pre‐boot code needed to boot a server from an iSCSI SAN. However, today’s production NICs found in your everyday servers are beginning to natively support Boot‐from‐iSCSI, further driving the answer to this question towards a resounding “no.”

Getting to High Availability with Hyper‐V All of this discussion prepares us to answer the primary question: How does one achieve high availability with HyperV and iSCSI? With all the architectural options available, answering this question best requires a bit of an iterative approach. That iterative approach recognizes that every implementation of Hyper‐V that desires true high availability must do so via the Windows Failover Clustering feature. This general‐purpose clustering solution enables Windows Servers to add high availability to many different services, Hyper‐V being only one in its long list. Thus, being successful with highly‐available Hyper‐V also requires skills in Windows Failover Clustering. While the details of installing and working with Windows Failover Clustering are best left for

other publications, this chapter and guide will assist with correctly creating the neededstorage and networking configurations. The second point to remember is that Windows Failover Clustering requires the use of shared storage between all hosts. Using Cluster Shared Volumes (CSV) in Windows Server 2008 R2, this storage is actively shared by all cluster nodes, with all nodes accessing connected LUNs at once. Microsoft’s CSV transparently handles the necessary arbitration between cluster nodes to ensure that only one node at a time can interact with a Hyper‐V virtual machine or its configuration files. Going a step further, Hyper‐V and its high‐availability clusters can obviously be created in many ways, with a range of redundancy options available depending on your needs, available hardware, and level of workload criticality. Obviously, the more redundancy you add to the environment, the more failures you can protect yourself against, but also the more you’ll spend to get there. It is easiest to visualize these redundancy options by iteratively stepping through them, starting with the simplest options first. The next few sections will start with a very simple single‐server implementation that enjoys redundant connections. Through the next set of

sections, you’ll see where additional levels of redundancy can be added to protect against various types of failures. To keep the figures simple, color‐coding has been used for the connections between server and network infrastructure. That color‐coding is explained in Figure 2.7. As you can see, Production Network (or, “virtual machine”) connections are marked in green, with Storage Network connections marked in red. It is a best practice as Hyper‐V clusters are scaled out to separate management traffic from virtual machine traffic, and where appropriate, it is labeled in black. It is also recommended that the cluster itself be reserved a separate network for its heartbeat communication. That connection has been labeled in blue where appropriate.


27

Figure 2.7: Color coding for the following set of figures.

Single Server, Redundant Connections The simplest of configurations involves creating a single‐server Hyper‐V environment (see Figure 2.8). Here, a single server connects to its network via a single network switch. This configuration is different from the overly‐simplistic diagram first seen in Figure 1.1 in that both the Production Network and Storage Network connections have been made redundant in the setup in Figure 2.8.

Figure 2.8: Single HyperV server, redundant connections.

In this architecture, Hyper‐V server traffic is segregated into two different subnets. This is done to separate storage traffic from production networking traffic, and is an important configuration because of Hyper‐V’s very high reliance on its connection to virtual machine storage. Separating traffic in this manner ensures that an overconsumption of traditional network bandwidth does not impact the performance of running virtual machines.

Both connections in this architecture have also been made highly redundant, though through different means. Here, Production Network traffic is teamed using a network switching protocol such as 802.3ad NIC teaming, while Storage Network traffic is aggregated using MPIO or MCS.

Single Server, Redundant Path Although this first configuration eliminates some points of failure through its addition of extra connections, the switch to which those connections occur becomes a very important single point of failure. Should the switch fail, every Hyper‐V virtual machine on the server will cease to operate.


28

To protect against this situation, further expansion of connections can be made to create a fully‐redundant path between the Hyper‐V server and the production network core as well as between Storage Network NICs and the storage device. Figure 2.9 shows how this might look.

Figure 2.9: Single HyperV server, fullyredundant path.

In this configuration, Production Network connections for the single Hyper‐V server can either remain in their existing configuration to the single network switch or they can be broken apart and directed to different switches. This option is presented here and elsewhere with an “optional” dashed line because not all networking equipment can support the aggregating of links across different switch devices.

This limitation with NIC teaming highlights one of the benefits of MPIO and MCS. Due to these protocols’ position above TCP/IP, each Storage Network connection leverages its own IP address. This address can be routed through different paths as necessary with the protocol reassembling data on either end.

Note It is also important to recognize in any redundant path configuration that a true “redundant path” requires separation of traffic at every hop between server and storage. This requirement can make redundant pathing an expensive option when supporting networking equipment is not already in place.


29

Hyper‐V Cluster, Minimal Configuration Yet even the most highly‐available network path doesn’t help when a Hyper‐V server’s motherboard dies in the middle of the night. To protect against the loss of an individual server, Hyper‐V must run atop a Windows Failover Cluster. This service enables virtual machines to be owned by more than one server, as well as enables the failover of virtual machine ownership from one host to another.

As such, creating a Hyper‐V cluster protects against an entirely new set of potential failures. Such a cluster, see Figure 2.10, requires that all virtual machines are stored elsewhere on network‐attached disks, with all cluster nodes having concurrent access to their LUN.

Figure 2.10: HyperV Cluster, minimal configuration.

Microsoft’s recommended minimum configuration for an iSCSI‐based Hyper‐V cluster (or, indeed any iSCSI‐based Windows Failover Cluster) requires at least three connections that exist on three different subnets. Like before, one connection each is required for Production Network and Storage Network traffic. A third connection is required to handle inter‐cluster communication, commonly called the “heartbeat.” This connection must be segregated due to the low tolerance for latency in cluster communication.

Hyper‐V Cluster, Redundant Connections Such a cluster configuration like the one previously explained actually removes points of redundancy as it adds others. Although the previous configuration has the potential to survive the loss of a cluster host, its configuration is no longer redundant from the perspective of the network connections coming out of each server.

Needed at this point is the merging of the redundant configuration from the single‐server configuration with the now clustered configuration. That updated architecture is shown in Figure 2.11. There, both Production Network and Storage Network connections have been ade redundant to the network switch using the protocols explained earlier. m


30

Hyper-V Server NetworkSwitch

Hyper-V Server

Figure 2.11: HyperV cluster, redundant connections.

You’ll also see in Figure 2.11 that an additional black line has been drawn between both Hyper‐V servers and the network switch. This line represents an additionally‐segregated network connection that is used for managing Hyper‐V. It is considered a best practice with mature Hyper‐V implementations to segregate the management of a Hyper‐V server from the networks used by its virtual machines. This is done for several reasons:

• Segregation of security domains—Virtual machines operate at a security trust level that is considered higher than that of management traffic. By segregating virtual machine traffic from management traffic, virtual machines can be better monitored. Further, management connections cannot be used to intercept virtual machine communications.

• Segregation of Live Migration traffic—Transferring ownership of a virtual machine from one host to another can consume a large amount of available bandwidth over a short period of time. This consumption can have a negative impact on the operations of other virtual machines. By segregating Live Migration traffic into its own subnet that is shared with management traffic, this effect is eliminated.

• Protection of management functionality—In the case where a network attack is occurring on one or more virtual machines, segregating management traffic ensures that the Hyper‐V host can be managed while troubleshooting and repair functions are completed. Without this separate connection, it can be possible for a would‐be attacker to deny access to administrators to resolve the problem.

Hyper‐V Cluster, Redundant Path Finally, this discussion culminates with the summation of all the earlier architectures, combining redundant paths with a fully‐functioning Hyper‐V cluster. This architecture, see Figure 2.12, enjoys all the benefits of the previous iterations all at once, yet requires the greatest number of connections as well as the largest amount of complexity.


31

Figure 2.12: HyperV cluster, fullyredundant path.

It should be obvious to see the level of added cost that an environment such as this brings. Each cluster node requires a minimum of six network connections, spread across two halves of a switching infrastructure. Due to their inability to share RAM resources between running virtual machines, Hyper‐V clusters operate with the greatest levels of efficiency when they are configured with more nodes than less. Thus, a four‐node cluster will require 24 connections, with an eight‐node cluster requiring 48 connections, and so on.

High Availability Scales with Your Pocketbook With all this added expenditure comes the protection against many common problems. Individual servers can fail and virtual machines will automatically relocate elsewhere. Disks can fail and be automatically replaced by storage device RAID. Individual connections can fail with the assurance that surviving connections will maintain operations. Even an entire switch can fail and keep the cluster active. It is important to recognize that your level of need for high availability depends on your tolerance for loss. As with physical servers, more redundancy options costs you more but ensures higher reliability.

But reliability in Hyper‐V’s storage subsystem is fundamentally critical as well. If you create all these connections but attach them to a less‐than‐exemplary SAN, then you’ve still set yourself up for failure. Finding the right features and capabilities in such a storage device is

s. critical to maintaining those virtual machine disk files as they’re run by cluster node

The next chapter of this book takes a step back from the Hyper‐V part of a Hyper‐V architecture, and delves deep into just those capabilities that you probably will want in your SAN infrastructure. It will discuss how certain SAN capabilities being made available

o virtualization infrastructures. only today are specifically designed to provide an assist t


32

Chapter 3: Critical Storage Capabilities for Highly‐Available Hyper‐V

Chapter 2 highlighted the fact that high availability is fundamentally critical to a successful HyperV infrastructure. This is the case because uncompensated hardware failures in any Hyper‐V infrastructure have the potential to be much more painful than what you’re used to seeing in traditional physical environments.

A strong statement, but think for a minute about this increased potential for loss: In any virtual environment, your goal is to optimize the use of physical equipment by running multiple virtual workloads atop smaller numbers of physical hosts. Doing so gives you fantastic flexibility in managing your computing environment. But doing so, at the same time, increases your level of risk and impact to operations. When ten workloads, for example, are running atop a single piece of hardware, the loss of that hardware can affect ten times the infrastructure and create ten times the pain for your users.

Due to this increased level of risk and impact, you must plan appropriately to compensate for the range of failures that can potentially occur. The issue here is that no single technology solution compensates for every possible failure. Needed are a set of solutions that work in concert to protect the virtual environment against the full range of possibilities.

Depicted in Figure 3.1 is an extended representation of the previous chapter’s fully‐redundant Hyper‐V environment. There, each Hyper‐V server connects via multiple connections to a networking infrastructure. That networking infrastructure in turn connects via multiple paths to the centralized iSCSI storage infrastructure. Consider for a minute which failures are compensated for through this architecture:

• Storage and Production Network traffic can survive the loss of a single NIC due to the incorporation of 802.3ad network teaming and/or MPIO/MCS.

• Storage and Production Network traffic can also survive the loss of an entire network switch due to the incorporation of 802.3ad network teaming and/or MPIO/MCS that has been spread across multiple switches.

• Virtual machines can survive the planned outage of a Hyper‐V host through Live Migration as a function of Windows Failover Clustering.

• Virtual machines can also be quickly returned to service after the unplanned outage of a Hyper‐V host as a function of Windows Failover Clustering.

• Network oversubscription and the potential for virtual machine denial of service are inhibited through the segregation of network traffic across Storage, Production, Management, and Heartbeat connections.


33

Hyper-V ServerHyper-V Server

iSCSIStorage

NetworkSwitch

NetworkSwitch

Figure 3.1: HyperV environments require a set of solutions to protect against all of

the possible failures.

The risk associated with each of these potential failures has been mitigated through the implementation of multiple layers of redundancy. However, this design hasn’t necessarily taken into account its largest potential source of risk and impact. Take another look at Figure 3.1. In that figure, one element remains that in and of itself can become a significant single point of failure for your Hyper‐V infrastructure. That element is the iSCSI storage device itself.

Each and every virtual machine in your Hyper‐V environment requires storage for its disk files. This means that any uncompensated failure in that iSCSI storage has the potential to take down each and every virtual machine all at once, and with it goes your business’ entire computing infrastructure. As such, there’s a lot riding on the success of your storage infrastructure. This critical recognition should drive some important decisions about how you plan for your Hyper‐V storage needs. It is also the theme behind this guide’s third chapter.

Virtual Success Is Highly Dependent on Storage In the end, storage really is little more than just a bunch of disks. You must have enough disk space to store your virtual machines. You must also have enough disk space for all the other storage accoutrements that a business computing environment requires: ISO files, user home folders, space for business databases, and so on. Yet while raw disk space itself is important, the architecture and management of that disk space is exceptionally critical to

tely obvious.

virtualization success in ways that might not be immedia


34

Chapter 2 introduced the suggestion that the goal for SAN availability is “no nines,” or what amounts to 100% availability. Although this requirement might seem an impossibility at first blush, it is in fact a necessity. The operational risk of a SAN failure is made even more painful by the level of impact such an event will have on your environment. As a result, your goal in selecting, architecting, and implementing your SAN is to ensure that its design contains no single points of failure.

Today’s iSCSI SAN equipment accomplishes this lofty goal through the concurrent implementation of a set of capabilities that layer on top of each other. This layered approach to eliminating points of failure ensures that surviving hardware always has the resources and data copies it needs to continue serving the environment without interruption.

“NonInterruptive” Is Important This concept of non‐interruptive assurance during failure conditions is also critical to your SAN selection and architecture. Your selected SAN must be able to maintain its operations without interruption as failures occur. Although non‐interruptive in this definition might mean an imperceptibly slight delay as the SAN re‐converges after a failure, that delay must be less than the tolerance of the servers to which it is connected. As you’ll discover later in this chapter, non‐interruptive is important not only during failure operations but also during maintenance and management operations.

The easiest way to understand how this approach brings value is through an iterative look at each compensating layer. The next few sections will discuss how today’s best in class iSCSI SAN hardware has eliminated the SAN as a potential single point of failure.

Modular Node Architecture Easily the most fundamental new approach in eliminating the single point of failure is in eliminating the “single point” approach to SAN hardware. Modern iSCSI SAN hardware accomplishes this by compressing SAN hardware into individual and independent modules or “nodes.” These nodes can be used independently if needed for light or low‐priority uses.

. Or, they can be logically connected through a storage network to create an array of nodes

Figure 3.2 shows a logical representation of how this architecture might look. Here, four independent storage nodes have been logically connected using their built‐in management software and a dedicated storage network. Within each node are 12 disks for data storage as well as all the other necessary components such as processors, power supplies, NICs, and so on. The result of connecting these four devices is a single logical iSCSI storage device. That device has the capacity to present the summation of each device’s available storage to users and servers.


35

Figure 3.2: Multiple storage nodes aggregate to create a single logical device.

Important to recognize here is that each device can be an independent entity or aggregated with others to modularly increase the capacity of the SAN. This modular approach can be added to or subtracted from as the data needs of its owner changes over time. This presents a useful benefit to the ownership of such a SAN over more traditional monolithic approaches: Its capacity can be expanded or otherwise modified as necessary without the need for wholesale hardware replacements.

Consider as an alternative the more traditional monolithic SAN. These devices rely on the population of a storage “frame” with disks, storage processors, and switch fabric devices. In this type of SAN, there is a physical limit to the amount of storage that can be added into such a frame. Once that frame is full to capacity, either additional frames must be purchased or existing disks or frames must be swapped out for others that have greater capacity. The result can be a massive capital expenditure when specific threshold limits are exceeded.

Using the modular approach, new modules can be added to existing ones at any point. Management software within each module is used to complete the logical connection through the dedicated storage network. That same software can be configured to automatically accomplish post‐augmentation tasks such as volume restriping and re‐optimization on behalf of the administrator. This chapter will talk more about these anagement functions shortly. m


36

Redundant Storage Processors Per Node Modularization alone does nothing to enhance storage availability. It also does nothing to enhance the resiliency of the individual node and its data. However, it does provide the framework in which much of the aforementioned advanced availability features lie.

Every storage device requires some sort of processor in order to accomplish its stated mission. Although some processors leverage entirely proprietary code, many processors today rest atop highly‐tailored distributions of existing operating systems (OSs) such as Linux or Windows Storage Server. No matter which OS is at its core, one architectural element that is critical to ensuring node resiliency is the use of redundant storage processors within each individual node.

Figure 3.3 shows how this might look in a storage device that is comprised of four nodes. Here, each individual node includes two storage processors that are clustered for the purposes of redundancy. With this architecture in place, the loss of a storage processor will not impact the functionality of the individual node.

Figure 3.3: Multiple storage processors per node ensure individual node resiliency.

This architecture comes in particularly handy when nodes are used independently. In this configuration, a single node can survive the loss of a storage processor without experiencing an interruption of service.

Redundant Network Connections and Paths Redundancy in processing is a great feature, but even multiple storage processors cannot assist when network connections go down. The risk of network failure is in fact such a common occurrence that the entirety of Chapter 2 was dedicated to highlighting the

or Hyper‐V. necessary server‐to‐SAN connections that are required f


37

Yet that discussion in Chapter 2 did not include one critical redundancy element that is shown in Figure 3.4. This redundancy becomes relevant when used in the framework of a modular SAN architecture. There, each individual storage node has also been connected to the storage network using redundant connections.

Figure 3.4: Redundant connections and paths relate to internode communication as

well as servertonode.

Important to recognize here is that this configuration is necessary not only for resiliency but also for raw throughput. Because each individual storage node is likely connected to by multiple servers, the raw network performance in and out of each node can be more than is possible through a single connection. Although all iSCSI storage nodes have at least two network connections per node, those that are used in support of extremely high throughput may include four or more to support the necessary load.

Note Measuring that performance is a critical management activity. iSCSI storage nodes tend to come equipped with the same classes of performance counters that you’re used to seeing on servers: Processor, network, and memory utilization are three that are common. Connecting these counters into your monitoring infrastructure will ensure that your Hyper‐V server needs aren’t oversubscribing any part of your SAN infrastructure.

Disk‐to‐Disk RAID RAID has been around for a long time. So long, in fact, that it is one of those few acronyms that doesn’t need to be written out full when used in guides like this one. Although RAID has indeed had a long history in IT, it’s important to recognize that it is another high‐availability feature that you should pay attention to as you consider a SAN storage device for Hyper‐V.


38

The reason behind this special consideration has to do with the many types of RAID protection that SANs can deploy over and above those traditionally available within individual servers. These added RAID levels are made possible in many ways due to the sheer number of disks that are available within an individual storage node.

Figure 3.5 shows a graphical representation of how some of these might look. In addition to the usual RAID 1 (mirroring), RAID 5 (striping with parity), and RAID 1+0 (disks are striped, then mirrored) options that are common to servers, SANs can often leverage additional RAID options such as RAID‐with‐hot‐spares, RAID 6 (striping with double parity), and RAID 10 (disks are mirrored, then striped), among others.

Figure 3.5: Disktodisk RAID in iSCSI storage devices is functionally similar to RAID

within individual servers.

These alternative options are often necessary as the size of SANs grow due to the potential for multiple disk failures. Although the traditional RAID levels used in servers are designed to protect against a single disk failure, they are ineffective against the situation where more than one disk fails in the same volume. The added level of protection gained through advanced RAID techniques becomes increasingly necessary when large numbers of individual disks are present in each storage node.

Node‐to‐Node RAID Another RAID capability that is not common to server disk drives is the capacity to span volume redundancy across multiple nodes. In fact, this feature alone is one of the greatest reasons to consider the implementation of a multiple‐node architecture for the storage of Hyper‐V virtual machines as well as other critical data.


39

In Figure 3.6, the red boxes that represent intra‐node RAID have been augmented with another set of purple boxes. This second set of boxes highlights how node‐to‐node RAID configurations can span individual nodes. In this configuration, volumes have been configured in such a way that every piece of data on one node (or its parity information) is always replicated to one or more additional nodes in the logical device.

Figure 3.6: Nodetonode RAID ensures that entire nodes can fail with no impact to

operations.

Note Although Figure 3.6 shows an example of a RAID set that has been created across only a few disks in a few nodes, it is more common that RAID sets are created across every disk in the entire logical storage device. By creating a hardware RAID set in this manner, the entire device’s storage can then be made available to exposed volumes.

Depending on the storage device selected, multiple levels of node‐to‐node RAID are possible with each having its own benefits and costs. For example, each block of data can be replicated across two nodes. This configuration ensures that a block of data is always in two places at once. As an alternative that adds redundancy but also adds cost, each block can be replicated across three nodes, ensuring availability even after a double‐node failure.

This architecture is critically important for two reasons. First, it extends the logical storage device’s availability to protect against failures of an entire node or even multiple nodes. The net result is the creation of a storage environment that is functionally free of single points of failure.


40

As a second reason, such an architecture also increases the capacity of the logical storage device’s volumes to greater than the size of a single node. Considering the large size of Hyper‐V virtual machines, extremely large volume sizes may be necessary, such as those that are larger than can be supported by a single node alone.

Modularization Plus DisktoDisk RAID Equals SwapAbility Interesting to note here is how the combination of disk‐to‐disk RAID goes hand‐in‐hand with modularization. This combination of capabilities enables SAN hardware to be very easily replaced in the case of an entire‐node failure, making the individual node itself a hotswappable item. Think for a minute about how this might occur: Every block of data on such a SAN is always replicated to at least one other storage node. Thus, data is always protected when a node fails. When a failure occurs, an administrator needs only to remove the failed node and swap it with a functioning replacement. With minimal configuration, the replacement can automatically reconnect with the others in the logical storage device and synchronize the necessary data. As a result, even an entire node failure becomes as trivial as an individual disk failure.

Integrated Offsite Replication for Disaster Recovery And yet even these capabilities don’t protect against the ultimate failure: the loss of an entire operational site. Whether that loss is due to a natural disaster, one that is man‐made, or a misconfiguration that results in massive data destruction, there sometimes comes the need to relocate business operations in their entirety to a backup site.

What’s particularly interesting about disaster recovery and its techniques and technologies is that many are newcomers into the IT ecosystem. Although every business has long desired a fully‐featured disaster recovery solution, only in the past few years have the technologies caught up to make this dream affordable.

Needed at its core is a mechanism to replicate business data as well as data processing to alternate locations with an assurance of success. Further, that replication needs to occur in such a way that minimizes bandwidth requirements. To be truly useful, it must also be a solution that can be implemented without the need for highly‐specialized training and experience. In the case of a disaster, your business shouldn’t need specialists to failover your operations to a backup site nor fail them back to the primary site when the disaster is over.

Today’s best‐in‐class iSCSI SANs include the capability to connect a primary‐site SAN to a backup‐site SAN as Figure 3.7 shows. This being said, such a connection is a bit more than just plug‐and‐go. There are some careful considerations that are important to being successful, most especially when SAN data consists of Hyper‐V virtual machines.

CrossReference Chapter 4 will explore the architectures and requirements for disaster recovery in more detail.


41

Figure 3.7: Automated replication of changed data to alternate sites protects against

entire site loss.

Non‐Interruptive Capacity for Administrative Actions It has already been stated that architecting your storage infrastructure is exceptionally important to be successful with Hyper‐V. Yet getting that storage up and operational is only the first step in actually using your Hyper‐V virtual environment. It’s also the shortest step. Longer in timeframe and arguably more important are the management activities you’ll undergo after the installation is complete.

The processes involved with managing Hyper‐V storage often get overlooked when the initial architecture and installation is planned. However, these same administrative tasks, when not planned for, can cause complications and unnecessary outages down the road. No matter which action needs to be accomplished, your primary goal should be an ability to invoke those actions with the assurance that they will not interrupt running virtual machines.

If these statements sound alarmist, consider the long‐running history of storage technologies. In the not‐too‐distant past, otherwise simple tasks became operational impacts due to their need for volume downtime. These tasks included basic administrative actions such as extending an existing volume to add more disk space, installing a firmware upgrade, or augmenting the environment with additional nodes or frames. In the most egregious of examples, simple tasks such as these sometimes required the presence of on‐site assistance from manufacturer storage technicians.

That historical limitation added substantial complexity and cost to SAN ownership. Today, such limitations are wholly unacceptable when considered with the availability requirements needed by a virtual infrastructure. Your business simply can’t bring down

inistrative change to your storage. every virtual machine when you need to make a small adm


42

With this in mind, consider the following set of administrative activities that are common to all storage environments. Your SAN hardware should be able to accomplish each of them without interruption to virtual machine processing or other concurrent data access. Further, they also represent actions that a sufficiently‐experienced administrator should be able to accomplish with the right hardware and minimal tool‐specific instruction.

Note With these activities, iSCSI isn’t alone. Many of the features explained in the following sections should be available in other types of SAN equipment such as those that leverage fibre channel connections. Often, however, these features are only available at extra cost. This is an important consideration when purchasing a new storage infrastructure. Look carefully to the capabilities that are offered by your SAN vendor to ensure that the right set of management activities is available for your needs. For some vendors, you may need to purchase the rights to use certain management functions. As an alternative, look to an all‐inclusive SAN vendor that does not price out advanced functionality at extra cost.

Volume Activities Early monolithic SAN infrastructures required complex configuration file changes when volumes needed reconfiguration. For some vendors, this configuration file change was an exceptionally painful operation, often requiring the on‐site presence of trained professionals to ensure its successful implementation.

Today, volume changes are relatively commonplace activities. Administrators recognize that provisioning too much storage to a particular volume takes away disk space from other volumes that might need it down the road. It is for this reason that today’s best practices in volume size assignment are to maintain a small but constant percentage of free space. This sliding window of available space can require administrators to constantly monitor and adjust sizes as needed. Some SANs have the capability to automatically scale the size of volumes per preconfigured thresholds. No matter which method you use, this activity on today’s iSCSI SANs should not require downtime to either the volume or connected users and servers.

Advanced SANs provide the capability to accomplish other volume‐based tasks without interruption as well. These tasks can relate to changing how a volume is provisioned, such as thin‐provisioned versus pre‐allocated, or configured RAID settings. For example, volumes that start their operational life cycle as a low priority resource may later grow in criticality and require additional RAID protection. That reconfiguration should occur ithout interruption to operations. w


43

Storage Node Activities Activities associated with the storage node itself should also be accomplished without impact to data access. For example, adding, removing, or replacing storage nodes from a logical storage device are tasks that can and should be possible without interruption. Important to recognize here are the non‐interruptive internal activities that must occur in the background after such a dramatic change to the storage environment:

• Adding a node automatically restripes existing volumes across the new node, balancing storage across the now‐larger logical storage device.

• Removing a node automatically relocates data off the node prior to the actual removal activity, ensuring that data remains available even after the node has been removed from the logical storage device.

• Replacing a node automatically rebuilds volumes from surviving data on the remaining nodes.

Another useful cross‐node activity is the use of automated volume restriping to reduce spindle contention. This problem of spindle contention was first introduced in Chapter 1 and can have a larger‐than‐normal impact on storage that is part of a virtualization infrastructure. In essence, when the disk use of virtual machines becomes greater than expected, virtual machines whose disk files share the same disk spindles in the SAN infrastructure will experience a bottleneck. Collocated virtual machines in this situation experience a collective reduction in performance as each vies for attention by the storage device.

To alleviate this situation, some storage devices have the ability to watch for spindle contention and transparently relocate data files to alternate locations on disk. The result is a more optimized distribution of storage hardware resources across the entire logical device as well as better overall performance for virtual machines.

Data Activities Storage arrays commonly include the ability to snapshot volumes as well as replicate them to other locations within and outside the logical device. Snapshotting activities are critical to reducing backup windows. They also provide the ability to quickly create point‐in‐time copies of virtual machines for testing or other purposes.

Replication is often necessary when virtual machines or other data must be offloaded to alternate volumes or logical storage devices—this can be due to a forklift upgrade of the logical storage device or because it is necessary to create copies of volumes for device‐to‐device replication. As with the other activities, completing these data‐related activities hould be a non‐interruptive process. s


44

Firmware Activities Last, is the not‐uncommon activity associated with updating the firmware on individual storage nodes. All storage devices require the occasional update of firmware code in order to add features, eliminate bugs, and update code to prevent known attacks.

This updating of SAN firmware must be an operation that does not require downtime. Downtime prevention may occur as a function of multiple storage processors or in using an OS that can implement updates without requiring a reboot.

Storage Virtualization The concepts that embody storage virtualization share little with those that are associated with traditional server virtualization. However, they do share the same high‐level meaning in that storage virtualization is also about abstraction. In the case of storage virtualization, the abstraction exists between logical storage (RAID sets, volumes, and so on) and the actual physical storage where that data resides.

You’ve already been exposed in this chapter to many of the capabilities that fall under the banner of storage virtualization: The ability to snapshot a drive abstracts the snapshot from the bits in its initial volume. Restriping a volume across multiple nodes requires a layer of abstraction as well. Accomplishing this task requires a meta‐layer atop the volume that that maps the logical storage to physical locations.

In the context of virtualization atop platforms such as Hyper‐V, storage virtualization brings some important management flexibility. It accomplishes this through the introduction of new features that improve the management of Hyper‐V virtualization. Let’s look at a few of these features in the following sections.

Snapshotting and Cloning Creating snapshots of volumes enables administrators to work with segregated copies of data but without the need to create entirely duplicate copies of that data. For example, consider the situation where you need to test the implementation of an update to a set of virtual machines on a volume. Using your SAN’s snapshotting technology, it is possible to create a duplicate copy of that entire volume. Because the volume has been created as a snapshot rather than a full copy, the time to complete the snapshot is significantly reduced. The level of consumed space is also only a fraction of the overall volume size.

Once created, actions like the aforementioned update installation can be completed on the snapshot volume. If the results are a success, the snapshot can be merged into the original olume or discarded. v


45

Backup and Restore with VSS Integration Snapshots are useful for other reasons as well. Backup operations are made much easier through the use of snapshots. Integrating those snapshots with Microsoft’s Volume Shadow Copy (VSS) ensures that backups successfully capture the state of the virtual machine along with its installed applications. Without VSS integration, installed applications and their data may not be correctly backed up. When seeking a SAN to be used in a virtualized environment, it is important to look for those that support VSS integration to ensure backups of these types of applications.

Volume Rollback A key advanced feature is the ability for volumes to be rolled backwards in time. This need can occur after a significant data loss or data corruption event. Combining snapshot technology with the capacity to store multiple snapshot iterations gives the administrator a series of time‐based volume snapshots. Rolling a volume to a previous snapshot quickly returns the volume to a state before the deletion or corruption occurred. Further, volume rollback can more quickly return a corrupted volume to operations than traditional restore techniques.

Thin Provisioning Lastly is the capability for volume thin provisioning. It has already been discussed in this chapter that today’s best practices suggest that volumes should be configured to maintain only a small level of free space. This small level ensures that available disk space can always be assigned to the volumes that need them.

One problem with this approach relates to how an OS will make use of an assigned volume. Unlike storage devices, OSs tend to create statically‐sized volumes for their configured disk drives. Thus, every storage device volume extension must be followed by a manual volume extension within the OS.

A method to get around this limitation is the use of thin provisioning. Here, a volume is presented to the OS for its anticipated size needs. On the storage device, however, the true size of the volume is only as large as the actual data being consumed by the OS. The storage device’s volume automatically grows in the background as necessary to provide free space for the OS. The result is that the OS’s volume does not need expansion while the storage device’s volume only uses space as necessary. This process significantly improves the overall utilization of free space across the storage device.

Caution Caution must be used in leveraging thin provisioning to ensure that the real allocation of disk space doesn’t go above true level of available disk space. Proper monitoring and alerting of storage space is critical to prevent this catastrophic event from occurring.


46

Storage Architecture and Management Is Key to Hyper‐V You’ve seen the comment presented over and over that the task of installing the very basics of Hyper‐V is excessively simplistic; the real skill comes in creating a Hyper‐V infrastructure that can survive the many possible failures that can and will occur in a production computing environment. Preventing those failures happens with the right combination of a good architecture and the capability to accomplish needed management activities without service interruption. You’ve learned about these needs in this chapter.

But this chapter’s discussion on storage capabilities has left one element remaining. You now understand how your iSCSI storage should be architected to ensure the highest levels of availability. But you haven’t really come to understand the special needs that arrive when an entire site goes down. Disaster recovery is the theme in the fourth and final chapter. Coming up, you’ll learn about the technologies and techniques you’ll need to

ter recovery site. consider when you expand your operations to a full disas


47

Chapter 4: The Role of Storage in Hyper‐V Disaster Recovery

You’ve learned about the power of iSCSI in Microsoft virtualization. You’ve seen the various ways in which iSCSI storage is connected into Hyper‐V. You’ve learned the best practices for architecting your connections along with the smart features that are necessary for 100% storage uptime. You’ve now got the knowledge you need to be successful in architecting iSCSI storage for HyperV.

With the information in this guide’s first three chapters it becomes possible to create a highly‐available virtual infrastructure atop Microsoft’s virtualization platform. With it, you can create and manage virtual machines with the assurance that they’ll survive the loss of a host, a connection, or any of the other outages that happen occasionally within a data center.

Yet this knowledge remains incomplete without a look at one final scenario: the complete disaster. That disaster might be something as substantial as a Category 5 hurricane or as innocuous as a power outage. But in every scenario, the end result is the same: You lose the computing power of an entire data center.

Important to recognize here is that the techniques and technologies that you use in preparing for a complete disaster are far, far different than those you implement for high availability. Disaster recovery elements are added to a virtual environment as an augmentation that protects against a particular type of outage.

Defining “Disaster” Before getting into the actual click‐by‐click installation of Hyper‐V disaster recovery, it is important first to understand what actually makes a disaster. Although the term “disaster” finds itself greatly overused in today’s sensationalist media (“Disaster in the South: News at

11.”), the actual concept of disaster in IT operations has a very specific meaning.

There are many technical definitions of “disasters” that exist, one of which your organization’s process framework likely leverages to functionally define when a disaster has occurred. Rather than relying on any of the technical definitions, however, this chapter will simply consider a disaster for IT operations to be an event that fully interrupts the operations of a data center.


48

Using this definition, you can quickly identify what kinds of events can be considered a disaster:

• A naturally‐occurring event, such as a tornado, flood, or hurricane, impacts your data center and causes damage; that damage causes the entire processing of that data center to cease

• A widespread incident, such as a water leakage or long‐term power outage that interrupts the functionality of your data center for an extended period of time

• An extended loss of communications to a data center, often caused by external forces such as utility problems, construction, accidentally severed cabling, and so on

Although disasters are most commonly associated with the types of events that end up on the news, the actual occurrence of newsworthy disasters is in fact quite rare. In reality, the events making up the second group in the previous list are much more likely to occur. Both cause interruption to a data center’s operations, but those in the first group occur with the kinds of large‐scale damage that requires greater effort to fix.

It is important to define disasters in this way because those above are handled in much different ways than simple service outages. Consider the following set of incidents that are problematic and involve outage but are in no way disasters:

• A problem with a virtual host creates a “blue screen of death,” immediately ceasing all processing on that server

• An administrator installs a piece of code that causes problems with a service, rver shutting down that service and preventing some action from occurring on the se

• An issue with power connections causes a server or an entire rack of servers to inadvertently and rapidly power down

The primary difference between these types of events and your more classic “disasters” relates to the actions that must occur to resolve the incident. In none of these three incidents has the operations of the data center been fully interrupted. Rather, in each, some problem has occurred that has caused a portion of the data center—a server, a service, or a rack—to experience a problem.

This differentiation is important because a business’ decision to declare a disaster and move to “disaster operations” is a major one. And the technologies that are laid into place to act upon that declaration are substantially different (and more costly) than those used to create simple high availability. In the case of a service failure, you are likely to leverage your high‐availability features such as Live Migration or automatic server restart. In a disaster, you will typically find yourself completely moving your processing to an alternative site. The failover and failback processes are big decisions with potentially big epercussions. r


49

Defining “Recovery” Chapter 3 started this guide’s conversation on disaster recovery through its iterative discussion on the features that are important to Hyper‐V storage. There, a graphic similar to Figure 4.1 was shown to explain how two different iSCSI storage devices could be connected across two different sites to create the framework for a disaster recovery environment.

Figure 4.1: The setup of two different SANs in two different sites lays the framework

for HyperV disaster recovery.

In Figure 4.1, you can see how two different iSCSI storage devices have been interconnected. The device on the left operates in the primary site and handles the storage needs for normal operations. On the left is another iSCSI storage device that contains enough space to hold a copy of the necessary data for disaster operations. Between these two storage devices is a network connection of high‐enough bandwidth to ensure that the data in both sites remains synchronized.

This architecture is important because at its very core virtualization makes disaster recovery far more possible than ever before. Virtualization’s encapsulation of servers into files on disk makes it both operationally feasible and affordable to replicate those servers to an alternative location.

At a very high level, disaster recovery for virtual environments is made up of three basic things:

• A storage mechanism

• A replication mechanism

• A target for receiving virtual machines and their data The storage mechanism used by a Hyper‐V virtual environment (or, really any virtual environment) is the location where each virtual machine’s disk files are contained. Because the state of those virtual machines is fully encapsulated by those disk files, it becomes trivial to replicate them to an alternative location. Leveraging technology either within the storage device, at the host, or a combination of both, creating a fully‐functional secondary site is at first blush as trivial as a file copy.


50

Now, obviously there are many factors that go into making this “file copy” actually functional in a production environment. There are different types of replication approaches that focus on performance or prevention of data loss. There are clustering mechanisms that actually enable the failover as well as failback once the primary site is returned to functionality. There are also protective technologies that ensure data is properly replicated to the alternative site such that it is crash‐consistent and application‐consistent. All of these technologies you will need to integrate when creating your own recovery solution.

The Importance of Replication, Synchronous and Asynchronous Without delving into the finer details of how this architecture is constructed, a primary question that must first be answered relates to how that synchronization is implemented. Remember that above all else, an iSCSI storage device is at its core just a bunch of disks. Those disks have been augmented with useful management functions to make them easier to work with (such as RAID, storage virtualization, snapshots, and so on), but at its most basic, a storage device remains little more than disk space and a connection.

This realization highlights the importance of how these two storage devices must remain in synch with each other. Remember that the sole reason for this second storage device’s existence is to create a second copy of production data comprised of both virtual machine disk files and the data those virtual machines work with. Thus, the mechanism by which data is replicated from primary to backup site (and, eventually, back) is important to how disaster recovery operations are initiated.

Two types of replication approaches are commonly used in this architecture to get data migrated between storage devices. Those two types are generically referred to as synchronous and asynchronous replication. Depending on your needs for data preservation as well as the resources you have available, you may select one or the other of these two options. Or, both.

Synchronous Replication In synchronous replication, changes to data are made on one node at a time. Those changes can be the writing of raw data to disk by an application or the change to a virtual machine’s disk file as a result of its operations. When data is written using synchronous replication, that change is first enacted on the primary node and then subsequently made on the secondary node. Important to recognize here is that the change is not considered complete until the change has been made on both nodes. By requiring that data is assuredly written on both nodes before the change is complete, the environment can also ensure that no data will be lost when an incident occurs.

Consider the following situation: A virtual machine running Microsoft Exchange is merrily doing its job, responding to Outlook clients and interacting with its Exchange data stores. That virtual machine’s disk files and data stores are replicated using synchronous replication to a second storage device in another location. Every disk transaction that occurs with the virtual machine requires the data to be changed at both the primary and secondary site before the next transaction can occur.


51

Figure 4.2 shows a breakdown of the steps required for this synchronous replication to fully occur. In this situation, the Exchange server makes a change to its database. That change is first committed at the primary site. It is then replicated to the secondary site, where it is committed to the storage device in that location. An acknowledgement of commitment is finally sent back to the primary site, upon which both storage devices can then move on to the next change.

Figure 4.2: A breakdown of the steps required for synchronous replication.

This kind of replication very obviously ensures that every piece of data is assuredly written before the next data change can be enacted. At the same time, you can see how those extra layers of assurance can create a bottleneck for the secondary site. As each change occurs, that change must be acknowledged across both storage devices before the next change can occur.

Synchronous replication works exceptionally well when the connection between storage devices is of very high bandwidth. Gigabit connections combined with short distances between devices reduces the intrinsic latency in this architecture. As a result, environments that require zero amounts of data loss in the case of a disaster will need to leverage synchronous replication.

Asynchronous Replication Asynchronous replication, in contrast, does not require data changes to occur in lock‐step between sites. Using asynchronous replication between sites, changes that occur to the primary site are configured to eventually be written to the backup site.

Leveraging preconfigured parameters, changes that occur to the primary site are queued for replication to the backup site as appropriate. This queuing of disk changes between sites enables the primary site to continue operating without waiting for each change’s commitment and acknowledgement at the backup site. The result is no loss of storage performance as a function of waiting for replication to complete.


52

Although asynchronous replication eliminates the performance penalties sometimes seen with synchronous replication, it does so by also eliminating the assurance of zero or nearly zero data loss. In Figure 4.3, you can see how changes at the primary site are queued up for eventual transfer to the backup site. Using this approach, changes can be submitted in batches as bandwidth allows; however, a disaster that occurs between change replication intervals will cause some loss of the queued data.

iSCSI Storage Device

Primary SiteiSCSI Storage Device

Backup Site

Change 1 Committed at Primary Site



Changes Replicated to Secondary Site


Figure 4.3: The steps involved with asynchronous replication.

Although the idea of “eventual replication” might seem scary in terms of data integrity, it is in fact an excellent solution for many types of disaster recovery scenarios. To give you an idea, turn back a few pages and take another look at the types of incidents that this chapter considers to be disasters. In either of these classes of events, the level of impact to the production data center facility is enormous. At the same time, those same types of disasters are likely to cause an impact to the people who work for the business as well.

For example, a natural disaster that impacts a data center is also likely to impact the brick‐and‐mortar offices of the business. This impact may impede the ability of employees to get the job of the business done. As a result, a slight loss in data may be insignificant when compared with the amount of business data that is saved, that will be used in the immediate term, and that can be reconstructed from other means.

Which Should You Choose? To summarize the discussed concepts, remember always that synchronous replication has the following characteristics:

• Assures no loss of data

• Requires a high‐bandwidth and low‐latency connection

• mance Write and acknowledgement latencies impact perfor

es • Requires shorter distances between storage devic


53

In cont ave the following characteristics: rast, asynchronous replication solutions h

• Potential for loss of data during a failure

• rant of latency Leverages smaller‐bandwidth connections, more tole

• cessing No performance impact to source server pro

• Potential to stretch across longer distances Your decision about which type of replication to implement will be determined primarily by your Recovery Point Objective (RPO), and secondarily by the amount of distance you intend to put between your primary and secondary sites.

Recovery Point Objective RPO is a measurement of your business’ tolerance for acceptable data loss for a particular service, and is formally defined as “the point in time to which you must recover data as defined by your organization.” Business services that are exceptionally intolerant of data loss are typified by production databases, critical email stores, or line of business applications. These services and applications cannot handle any loss of data for reasons based on business requirements, compliance regulations, or customer satisfaction. For these services, even the most destructive of disasters must be mitigated against because

siness operatiothe loss of even a small amount of data will significantly impact bu ns.

You’ll notice here that this definition does not talk about the RPO of your business but rather the RPO of particular business services. This is an important differentiation as well as one that requires special highlighting. Remember that every business has services that it considers to be Tier I or “business critical”. Those same businesses have other services that it considers to be Tier II or “moderately important” as well as others that are Tier III or “low importance.”

This differentiation is critically important because although virtualization indeed makes disaster recovery operationally feasible for today’s business, disaster recovery still represents an added cost. Your business might see the need for getting its production database back online within seconds, but it likely won’t need the same attention for its low‐importance WSUS servers or test labs.

Distance Between Sites Remember too that synchronous replication solutions require good bandwidth between sites. At the same time, they are relatively intolerant of latency between those connections. Thus, the physical distance between sites becomes another factor for determining which solution you will choose.

Of the different types of disasters, natural disasters tend to have the greatest impact on this decision. For example, to protect against a natural disaster like a Category 5 hurricane, you likely want your backup site to sit in a geographic location that is greater than the expected diameter of said hurricane. At the same time, Category 5 hurricanes are relatively rare events, while other events like extended power outages are much more likely.


54

It is for these reasons that combinations of synchronous, asynchronous, and even non‐replication for your servers can be an acceptable solution. Some of your servers need to stay up no matter what, while others can wait for the disaster to end and normal operations to return. Others can be protected against low‐impact disasters through short‐distance synchronous replication, while a tertiary site located far away protects against the worst of natural cataclysms. In all of these, cost and benefit will be your guide.

Note An additional and yet no less important determinant here relates to your support servers. When considering which virtual servers to enable for disaster recovery, remember to also make available those that provide support services. You don’t want to experience a disaster, fully failover, and find yourself without domain controllers to run the domain or Remote Desktop Servers to connect users to applications.

Ensuring Data Consistency No discussion on replication is complete without a look at the perils of data consistency. Bluntly put, if you expect to simply file‐copy your virtual machines from one storage device to another, you’ll quickly find that the resulting copies aren’t likely to power on all that well. Nor will their applications and databases be immediately available for use when a disaster strikes.

Data Consistency: An Exchange Analogy The best way to explain this problem is through a story. Have you or someone in your organization ever accidentally pulled the power cable on your Exchange Server? Or have you ever seen that Exchange Server crash, powering down without a proper shut down sequence? What happens when either of these two situations happens? In either situation, the Exchange database does not return back to operations immediately with the powering back on of the server. Instead it refuses to start Exchange’s services, reporting that its database was shut down uncleanly. The only solution when this occurs is a long and painful process of running multiple integration checks on the database to return it back to functionality. Depending on the size of the database, those integrity checks can require multiple hours to complete. During their entire process, your company must operate with a non‐fully‐functional mail system. It is for this reason that businesses that use Microsoft Exchange add high‐availability features such as battery backup, redundant power supplies, and even database replication to alternative systems.


55

Now, you might be asking yourself, “How does this story relate to data consistency in replicated virtual environments?” The answer is, Without the right technology in place, a dirty Exchange database can occur from a poorly‐replicated virtual machine in the exact same way that it does with a power fault. In either case, you must implement the right technologies if you’re to prevent that unclean shutdown.

The problem here has to do with the ways in which virtual machine data is replicated from primary site to backup site. Remember that a running virtual machine is also a virtual machine that is actively using its disk files. Thus, any traditional file copy that occurs from a primary site to a backup site will find that the file has changed during the course of the copy. Even ignoring the obvious locked‐file problems that occur with such open files, it becomes easy to see how running virtual machine disk files cannot be replicated without some extra technology in place.

Further complicating this problem are the applications that are running within that virtual machine itself. Consider Exchange once again as an example, although the issue exists within any installed transactional database. With a Microsoft Exchange data store, its .EDB file on disk behaves very much like a virtual machine’s disk file. In essence, although it may be possible to copy that .EDB file from one location to another, you can only be guaranteed a successful copy if the Exchange server is not actively using the file. If it is, changes are likely to occur during the course of the transfer that result in a corrupted database.

It is for both of these reasons that extra technology is required at one or more levels of the infrastructure to manage the transfer between primary and secondary sites. This technology commonly uses one of many different snapshotting technologies to watch for and transfer changes to virtual machines and their data as they occur.

Data integration technologies often require the installation of extension software to either the Hyper‐V cluster or the individual virtual machines. This software commonly integrates with the onboard Volume Shadow Copy service along with its application‐specific providers to create and work with dynamic snapshots of virtual machines and their installed applications. The result is much the same as what is seen with traditional application backup agents that integrate with applications like Exchange, SQL, and others, to successfully gather backups from running application instances. The difference here is that instead of gathering backups for transfer to tape, these solutions are gathering changes

for replication to a backup site.

Other solutions exist purely at the level of the storage device. These solutions leverage on‐device technology for ensuring that data is replicated consistently and in the proper order. It should be obvious that leveraging storage device‐centric solutions can be of lesser complexity: Using these solutions, installing agents to each virtual host or machine may not be required. Also, fewer “moving parts” are exposed to the administrator, allowing administrators to enable replication on a per‐device or per‐volume basis with the assurance that it will operate successfully with minimal further interaction. Depending on your environment, one or both of these solutions may be necessary for accomplishing your needs for replication.


56

Note When considering a secondary storage device for disaster recovery purposes, you must take into account the extra technologies required to ensure data consistency. In essence, if your backup site cannot automatically fail over without extra effort, you don’t have a complete disaster recovery solution.

Architecting Disaster Recovery for Hyper‐V All of this introductory discussion brings this conversation to the main topic of how to actually enable disaster recovery in Hyper‐V. You’ll find that the earlier discussion on storage devices and replication is fundamentally important for this architecture. Why? Because creating disaster recovery for Hyper‐V involves stretching your Hyper‐V cluster to two, three, or even many sites and implementing the necessary replication. The first half of accomplishing this is very similar to the cluster creation first introduced in Chapter 2.

Note As in Chapter 2, this guide will not detail the exact click‐by‐click steps necessary to build such a cluster. That information is better left for the step‐by‐step guide that is available on Microsoft’s Web site at http://technet.microsoft.com/en‐us/library/cc732488(WS.10).aspx.

Microsoft’s terminology for a Hyper‐V cluster that supports disaster recovery is a multisite cluster, although the terms stretch cluster and geocluster have all been used to describe the same architecture. By definition, a Microsoft multi‐site cluster is a traditional Windows Failover Cluster that has been extended so that different nodes in the same cluster reside in separate physical locations.

Figure 4.4 shows a network diagram of the same cluster that was first introduced in Figure 4.1. In Figure 4.4, you can see how the high‐availability elements that were added into the ingle‐site cluster have been mirrored within the backup site. s


57

Figure 4.4: A network diagram of a multisite cluster that includes highavailability

elements.

Full Redundancy Isn’t Always Necessary at the Backup Site This mirroring of high‐availability elements is present for completeness; however, it is not uncommon for backup site servers to leverage fewer redundancy features than are present in the production site. The reason for this reduction in redundancy lies within the reason for being for the cluster itself: Backup sites are most commonly used for disaster operations only—often only a small percentage of total operations—so the cost for full redundancy often outweighs its benefit. As you factor in the amount of time you expect to operate with virtual machines at the backup site, your individual architecture may also reveal that fewer features are necessary.

Important to recognize in this figure is the additional iSCSI storage location that exists within the backup site. Multi‐site Hyper‐V clusters leverage the use of local and replicated storage within each site. Although each Windows Failover Cluster generally requires this storage to be local to the site, its services provide no built‐in mechanisms for accomplishing the replication. You must turn to a third‐party provider—commonly either through your storage vendor or an application provider—to provide replication services between storage devices.

Note Although Microsoft has a replication solution in its Distributed File System Replication (DFS‐R) solution, this solution is neither appropriate nor supported for use as a cluster replication mechanism. DFS‐R only performs replication as a file is closed, an action that does not often happen with running virtual machines. Thus, it cannot operate as a cluster replication solution.


58

Choosing the Right Quorum In Windows Server 2008, Microsoft eliminated the earlier restriction that cluster nodes all reside on the same subnet. This restriction complicated the installation of multi‐site clusters because the process of extending subnets across sites was complex or even impossible in many companies. Today, the click‐by‐click process of creating a cluster across sites requires little more than installing the Windows Failover Clustering service onto each node and configuring the node appropriately.

Although clicking the buttons might be a trivial task, it is designing the architecture of that cluster where the greatest complexity is seen. One of the first decisions that must be made has to do with how the cluster determines whether it is still a cluster. This determination is made through a process of obtaining quorum.

Obtaining quorum in a Hyper‐V cluster is not unlike how your local Kiwanis or Rotary club obtains quorum in their weekly meetings. If you’ve ever been a part of a club where decisions were voted on, you’re familiar with this process. Consider the analogy: Decisions that are important to a Kiwanis club should probably be voted on by a large enough number of club members. In the bylaws of that club, a process (usually based on the rules of Parliamentary Procedure) is documented that explains how many members must be present for an important item to be voted on. That number is commonly 50% of the total members plus one. Without this number of members present, the club itself cannot vote on important matters, because it does not see itself as a fullyfunctioning club.

The same holds true in Hyper‐V clusters. Remember first that a cluster is by definition always prepared for the loss of one or more hosts. Thus, it must always be on the lookout for conditions where there are not enough surviving members for it to remain a cluster. This count of surviving members is referred to as the cluster’s quorum. And just like different Kiwanis clubs can use different mechanisms to identify how they measure quorum, there are different ways for your Hyper‐V cluster to identify whether it has quorum. In Windows Server 2008, four are identified.

Node and Disk Majority In the Node and Disk Majority model, each node gets a quorum vote, as does each disk. Here, a single‐site four‐node cluster would have five votes: one for each of the nodes plus one for its shared storage. Although useful for single‐site clusters that have an even number of nodes, Node and Disk Majority is not a recommended quorum model for multi‐site clusters. This is the case because the replicated shared storage introduces a number of challenges with multi‐site clusters. The process of replication can cause problems with SCSI commands across multiple nodes. Also, storage must be replicated in real‐time synchronous mode across all sites for the disks to retain the proper awareness.

Disk Only Majority In the Disk Only Majority model, only the individual storage devices have votes in the quorum determination. This model was used extensively in Windows Server 2003, and although it is still available in Windows Server 2008, it is not a recommended configuration for either single‐site or multi‐site clusters today.


59

Node Majority In the Node Majority model, only the individual cluster nodes have votes in the quorum determination. It is strongly suggested that environments that use this model do so with a node count that is equal to three or greater in single‐site clusters, and only with an odd number of nodes in multi‐site clusters. Clusters that leverage this model should also be configured such that the primary site contains a greater number of nodes than the secondary site. Further, the Node Majority model is not recommended when a multi‐site cluster is spread across more than two sites.

The reason for these recommendations has to do with how votes can be counted by the cluster in various failure conditions. Consider a two‐site cluster that has five nodes, three in the primary site and two in the secondary site. In this configuration, the cluster will remain active even with the loss of any two of the nodes. Even if the two nodes in the secondary site are lost, the three nodes in the primary site will remain active because three out of five votes can be counted.

Node and File Share Majority The Node and File Share Majority adds a separate file share witness to the Node Majority Model. Here, a file share on a server separate from the cluster is given one additional vote in the quorum determination. It is recommended that the file share be located in a site that is not one of the sites occupied by any of the cluster nodes. If no additional site exists, it is possible to locate the witness file share within the primary site; however, its location there does not provide the level of protection gained through the use of a completely separate site.

This introduction of the file share witness to the cluster quorum determination provides a very specific assist to multi‐site clusters in helping to arbitrate the quorum determination when entire sites are down. Because an entire‐site loss also results in the loss of network connectivity to all hosts on that site, the cluster can experience a situation known as “split brain” where multiple sites each believe that they have enough votes to remain an active cluster. This is an undesirable situation because each isolated and independent site will continue operating under the assumption that the other nodes are down, creating problems when those nodes are again available. Introducing the file share witness to the quorum determination ensures that an entire site loss cannot create a split brain condition, no matter how many nodes are present in the cluster.

Further, the Node and File Share Majority also makes possible the extension of clusters to more than two sites. A single file share in an isolated site can function as the witness for multiple clusters. Figure 4.5 shows a network diagram for how a witness disk can be used to ensure complete resiliency across a multi‐site cluster even with the loss of any single site.


60

Figure 4.5: Introducing a Witness Server further protects a multisite cluster from a

site failure.

Obtaining Quorum If you are considering a multi‐site cluster for disaster recovery, you will need to select one of the two recommended quorum options (Node Majority or Node and File Share Majority). That decision will most likely be based on the availability of an isolated site for the witness disk but can be based on other factors as well. The actual process of obtaining quorum is an activity that happens entirely under the covers within the Windows Failover Cluster service. To give you some idea of the technical details of this process, on its Web site at http://technet.microsoft.com/en‐us/library/cc730649(WS.10).aspx Microsoft identifies the high‐level phases that are used by cluster nodes to obtain quorum. Those phases have been reproduced here: As a given node comes up, it determines whether there are other cluster members that can be communicated with (this process may be in progress on multiple nodes simultaneously).

Once communication is established with other members, the members compare their membership “views” of the cluster until they agree on one view (based on timestamps and other information).

http://technet.microsoft.com/en-us/library/cc730649(WS.10).aspx


61

A determination is made as to whether this collection of members “has quorum” or, in other words, has enough members that a “split” scenario cannot exist. A “split” scenario would mean that another set of nodes that are in this cluster was running on a part of the network not accessible to these nodes.

If there are not enough votes to achieve quorum, the voters wait for more members to appear. If there are enough votes present, the Cluster service begins to bring cluster resources and applications into service.

With quorum attained, the cluster becomes fully functional.

Ensuring Network Connectivity and Resolution The final step in architecting your Hyper‐V cluster relates to the assurance that proper networking and name resolution are both present at any of the potential sites to which a virtual machine may fail over. This process is made significantly easier through the introduction of multi‐subnet support for Windows Failover Clusters. That support eliminates the complex (and sometimes impossible) networking configurations that are required to stretch a subnet across sites.

This is very obviously a powerful new feature. However, at the same time, the use of multiple subnets in a failover cluster means that virtual machines must be configured in such a way that they retain network connectivity as they move between sites. For example, the per‐virtual machine addressing for each virtual machine must be configured such that its IP address, subnet mask, gateway, and DNS servers all remain acceptable as it moves between any of the possible sites. Alternatively, DHCP and dynamic DNS can be used to automatically re‐address virtual machines when a failover event occurs.

Any of these events will involve some level of downtime for clients that attempt to connect to virtual machines as they move between sites. The primary delay in connection has to do with re‐convergence of proper DNS settings both on the servers as well as clients after a failover event. It may be necessary to reconfigure DNS settings to reduce their Time To Live (TTL) setting for DNS entries, or flush local caches on clients after DNS entries have been updated to reconnect clients with moved servers.

Disaster Recovery Is Finally Possible with Hyper‐V Virtualization Although this chapter’s discussion on disaster recovery might at first blush appear to be a complex solution, consider the alternatives of yesteryear. In the days before virtualization, disaster recovery options were limited to creating mirrored physical machines in alternative sites, replicating their data through best‐effort means, and manually updating

backup servers in lock‐step with their primary brethren.


62

Today’s solutions for Hyper‐V disaster recovery are still not installed through any Next, Next, Finish process. These architectures remain solutions rather than any simple product installation. However, with a smart architecture and planning in place, their actual implementation and ongoing management can be entirely feasible by today’s IT professionals. Doing so atop iSCSI‐based storage solutions further enhances the ease of implementation and management due to iSCSI’s network‐based roots.

Your next step is to actually implement what you’ve learned in this guide. With the knowledge you’ve discovered in its short count of pages, you’re now ready to augment Hyper‐V’s excessively simple installation with high‐powered high‐availability and disaster recovery. No matter whether you need a few servers to host a few virtual machines or a multi‐site infrastructure for complete resiliency, the iSCSI tools are available to manifest our needed production environment. y

Download Additional eBooks from Realtime Nexus! Realtime Nexus—The Digital Library provides world‐class expert resources that IT professionals depend on to learn about the newest technologies. If you found this eBook to be informative, we encourage you to download more of our industry‐leading technology eBooks and video guides at Realtime Nexus. Please visit ttp://nexus.realtimepublishers.com. h

Documents

The Shortcut Guide to Architect Ing iSCSI Storage for Microsoft Hyper-V