PC Infrastructure - 1 - Overview.pdf

This video is part of the Microsoft® Virtual Academy.

1

This session is going to cover the infrastructure of the private cloud. We’re going to really understand what are the critical requirements when we look at the hardware to design our own private cloud implementation. This particular section is going to provide an overview, talk about some of the planning considerations that we need when designing this initial infrastructure. Later modules are going to go further in analyzing the infrastructure and look at actual deployment configuration as well as long term management of this core piece of the private cloud fabric. I’m Symon Perriman, a technical evangelist with Microsoft, and with this session we are going to give you a broad foundation of everything you need to get started deploying your private cloud.

2

Over the years technology has seen a number of transformations, from the introduction of the main frame… First, let’s take a quick look at how we got to where we are today with the current data center. As you may remember, back in the day we started with a mainframe. It was a technology from many years ago and since then we have seen a number of transformations.

3

… the client server… After the mainframe came the client server model. You have a separate laptop, everybody’s connected to the server…

4

… to the world wide web… Then we moved over to the Web and the whole shift there was thinking everything was going to go online.

5

… to virtualization. Virtualization has been a driver of efficiency in IT for nearly a decade. Since then, things have kind of returned back to this whole data center again, once we realized some of the broad benefits of virtualization. And many of you out there are probably at this stage in your deployment model, where you have broad virtualization yet how do you take it to the next step?

6

what will that next transformation be? What is this next step? What will this next transformation be?

7

Well, we believe it is the concept of IT as a service, in which components are released and used on demand as needed. Instead of broad upfront capital expenditures, everything is done when needed. If you need a server you get it on demand rather than having to purchase it six months ahead of time. Using this concept of IT as a service, this is our foundation towards what we call the private cloud.

8

So let’s take a quick review, what exactly is the private cloud here? Well, with Microsoft as a private cloud remember it’s not just something that you can go and buy. The private cloud is not just a technology but it’s a combination of technologies and processes.

9

With the private cloud you’re able to not only have efficient infrastructure, but also manage it seamlessly and automatically. As we break down the private cloud infrastructure, we’ll cover the servers, the storage, the networking, and these components we consider the fabric. We’ll look at some of the software and security considerations of making sure that this infrastructure has a solid foundation and will keep online. And then we’ll move on to some of the more advanced topics as part of this planning, understanding how high availability works with failover clustering, and some of the core requirements needed for virtualization.

10

11

We’ll start by understanding what is really critical for a server as part of this private cloud fabric. First, current Windows Server® editions require a 64-bit processor. (Although the Itanium is a 64 bit processor, we discourage using this architecture as it will no longer be shipped in future versions of Windows® operating systems.) The 64 bit processor will provide the ideal foundation for not just this generation of servers but subsequent generations down the line. For the processor, having as much CPU speed, as many sockets, and as many calls as possible is going to be especially important for virtualization. Under virtualization, lots of virtual machines run on a single piece of physical hardware. Yet each of these virtual machines has specific requirements. They might need their own virtual CPU, and to supply so many virtual CPUs, you need to ensure that you have multi-socket and multi-core server deployments. Virtualization support is another consideration for the server. The two main processor manufacturers, AMD and Intel, each has its own virtualization technologies for hardware assisted virtualization: AMD virtualization (AMDV), and Intel virtualization technology (Intel VT). Once you boot up a computer, you need to explicitly enable this hardware assisted virtualization from the BIOS. Another recommendation is to have a second level address translation (SLAT). With second level address translation, rather than having additional computation happen at the hardware level, some of that is offloaded to the virtual processors. This means that the VMs are going to perform much more efficiently in your infrastructure. The final virtualization requirement is hardware enabled data execution prevention (DEP). This is another setting which must be enabled in the BIOS. It ensures that unallocated pieces of memory are not executing code, which is designed to protect from malicious attacks such as buffer overflows. By preventing any of that non allocated memory from being used, you have better isolation within your virtual machines and better security. To meet the storage requirements for the actual host OS, the core server, you need some RAID disk space for the OS partition. Although the general installation of Windows Server 2008 R2 runs about eight to 10 gigabytes (GB), it’s best to make sure that you have additional local storage space, because this can often be used for page files that the virtual machines need for faster performance. You need to have at least 64 GB of RAM; the exact amount would depend on the number of virtual machines that you plan to host on the server. For example, if each virtual machine is running 2 GB of memory, you could probably have about 30 VMs in a 64-bit, 64 GB host. You must always reserve spare memory for their host OS itself as well, so the host can continue to perform and manage all of the VMs that are running on it. Certainly the more RAM you have, the better. Some of this will depend on the workflow that you’re running. Clients such as VDI, Windows 7, and your guest VMs don’t use very much memory. Yet if you’re running some high intensity workloads such as Microsoft SQL Server® database software, you’ll need to allocate a lot more RAM. When you look at all of these components, it’s very important to make sure that they all have a Windows Server logo. This is actually a requirement for cluster support.

12

There are a lot of considerations in choosing the right server infrastructure. But the Microsoft Assessment and Planning Toolkit can help make this planning easier. MAP is a solution accelerator, which means it was released independently from any products and can be downloaded for free today at www.microsoft.com/map. MAP performs an agentless discovery and inventory to analyze all of the servers in your environment for various factors such as which support virtualization, which have enough memory, which need upgrades, and which have enough space left for the host partition. This report provides an easy to use informational guide on your current environment and what you need to upgrade your environment to prepare it for the private cloud. Additionally, MAP version 6.0 (the current version) can make recommendations for preconfigured hardware, components, and solutions based on your current implementation. With the Microsoft Hyper-V® Cloud Fast Track program—recently been rebranded as the Microsoft Private Cloud Fast Track program—you can actually just get an entirely end to end configured solution in a single unit. The unit comes with a certain number of server and capacity, and you simply plug in multiple redundant Internet connections to have this new component ready to be a factor in your private cloud.

13

Now we’ll take a quick look at MAP and see some of the features and functionality. I’ll scan one of the environments in this demo and take a look at some of the reports that come out of it. The first step is to perform an inventory of the organization’s information services infrastructure. Let’s begin our hardware inventory assessment and see MAP use its built-in wizard, which will make gathering all of this data quick and easy. The hardware inventory data will be gathered and stored in a SQL Server database that comes with MAP; we’ll select an existing database called Server Demo. The wizard begins by asking us which MAP inventory scenarios we’re interested in; we want to gather information from Windows-based and VMware computers. For the purpose of this demo we’re going to skip a lot of the inventory configuration steps and go straight to the analysis of all of the inventory results. To look at our results, we expand the discovery and readiness tab and click Windows Server 2008 R2 Readiness, and MAP displays a summary of our readiness. As you can see from the green slice of the chart, six servers have already met the recommended system requirements for Windows Server 2008 R2. The teal slice shows that 10 servers are capable of running with the minimum system requirements. We also have 13 servers that could not run Windows Server 2008 R2 at all. But MAP can tell us why and it can recommend the appropriate hardware upgrades to enable these servers to run the new operating system. Notice that we also have a category called insufficient data, and any machine which has a connection issue and cannot run MAP is going to be placed into this category. We also see a summary of the operating systems and servers which are running and the installation count for each. The formal readiness assessment contains summary-level information about our server’s infrastructure, including an executive summary and then a readiness analysis of our hardware. We can see some of the same data that we’ve already viewed with the MAP tool. We can see a software summary with information about device drivers, operating systems, installed services and applications, and we can even review a lot more detail which is not provided in the core MAP tool. In this Microsoft Excel® workbook we now get detailed information about every inventory computer in the network, including the applications and installed devices. In the Server Assessment worksheet, columns show us whether each machine meets its requirement for running Windows Server 2008 R2. For example, some of these services do not meet the requirements needed, such as the minimal disk space. So far we have inventoried our hardware environment and performed a readiness assessment for Windows Server 2008 R2. Now let’s take a look at how MAP can help us with Hyper-V virtualization and consolidation planning. There are four steps to this process. First we inventory the environment, which we’ve already done. Second, we gather performance counters and utilization metrics from all of the remote computers. Third, we configure hardware libraries for the host computers. And finally we consolidate servers and decide on placement of all the current physical and virtual servers. Now I’m quickly going to click through the performance metric wizards as we import computer names, provide WMI credentials, and define performance collection intervals. As a best practice we do recommend collecting performance data around peak utilization times of the computers as this simulates an end production environment. Once we have all of these metrics we can now specify the hardware profile for the Hyper-V host machine. This process includes selecting an unknown or standard hardware profile configuration, choosing requirements such as CPU storage, disk IO, and RAM, and finally we can configure the host machine to meet all of the needs of our environment once this step is complete. Now we are going to complete the server consolidation planning and see how we can place the physical machines on the host computers that we configured in the previous step. In this exercise we select a Hyper-V technology platform, update the virtual host machine’s hardware configuration, manage assessment properties, and identify the guest computers for virtualization. Once we complete this step MAP automatically provides the consolidation results. We can generate both high-level summaries and detailed reports of the performance metrics for every server inventory. This report includes a summary of the collective performance data with details on the processor, network, memory, and logical disk utilization data. Now let’s select the server consolidation results. Based on the hardware profile that we defined earlier, the summary gives us an idea about how the consolidated environment is going to look. As you can see from the summary of the 29 physical servers we have, MAP has identified that 18 of them are virtualization candidates. Depending on the guest and host server configurations, you can make appropriate adjustments and consolidate even further. MAP also presents a view of the virtualization metrics before and after the consolidation assessment. As with all other scenarios, MAP can provide a comprehensive view of the consolidated environment through reports and proposals. For example, this proposal provides us with a summary review of the environment before and after virtualization, and it shows all of the technical aspects including the host machine’s details and consolidation recommendations. This Excel report offers a comprehensive view of the environment before and after virtualization, showing all of these technical aspects, including unplaced machines. It also provides recommendations for virtualizing specific servers. Now one of the great aspects of MAP is the return on investment (ROI) assessment for the consolidation scenario. Using this integrated Microsoft virtualization calculator, we can estimate potential ROI and savings for implemented and virtualized environments. MAP provides a comprehensive report of the cost savings achieved through

14

virtualization, including a cost benefit analysis with average annual cost savings and a three-year ROI. The calculator provides us with a baseline analysis showing the potential benefits of a virtualized environment and also provides a report showing potential savings and ROI, including competitive comparison. As you can see, MAP has a broad range of functionality and should be a great guideline for getting ready for your private cloud implementation.

14

15

Let’s talk about storage a little more, and understand some of the core requirements for deploying your storage area network (SAN) in your private cloud. First the main question is why a storage area network? Why can’t you just use a local storage and have replication? The reason why you really want a storage area network is so that any server can access any piece of data at any time. One challenge with using direct attached storage is that it is bound to a server. If that server has crashed, if it’s unavailable, if it’s partitioned, you no longer have access to that data. However, by using a storage area network you can ensure that any server can access any data at any time, giving you a lot more flexibility in this private cloud implementation. The storage area network is also a core requirement for clustering, as all of the nodes must be able to access the data. So this is yet another reason why you really want to think storage area network as you’re designing the storage fabric in the private cloud. Now there are four main implementations of storage area networks to consider. The first two use host bus adapters (HBAs), using traditional storage routing technology. Fibre Channel is extremely fast, but it can be a little pricey. On the other hand, serial attached SCSI, or SAS, is a much cheaper implementation, yet it can be a little slower. If you want an even cheaper solution, you can look at some of the Ethernet solutions, which include Fibre Channel over Ethernet (FCoE) and iSCSI. Using these technologies, you actually route all of the storage information over Ethernet rather than over a storage area network. The advantage is that it’s going to be easier to manage one less fabric; however, you then have to worry about additional network bandwidth being allocated for storage usage as opposed to network communication. Any time you have one of these Ethernet storage technologies such as iSCSI for your server, make sure that you give that a dedicated storage path, a dedicated network interface card (NIC), a dedicated Ethernet port. Network communication tends to be relatively frequent, and the challenge is that storage traffic tends to be sudden and infrequent. It will be a large file copy, a bulk move, and you want to ensure that that irregular burst of activity does not affect your traditional regular network communication. You also need to consider high availability. By using RAID you can ensure that you have redundancy across all of your storage disks. If you lose a disk, generally you’ll be able to bring it back together by using parity bits from other disks that are part of that array. Additionally, with storage you want to use multipath IO (MPIO). MPIO provides redundant paths to the storage fabric so that if one of those paths is unavailable, it’s not a problem, you can still continue to communicate with our storage fabric. If you’re using clusters in the solution you want to consider using cluster shared volumes, a new distributed access file system released in Windows Server 2008 R2 and designed explicitly for Hyper-V. Using cluster shared volumes, nodes can simultaneously access a disk, enabling live migration: moving a running virtual machine from one server to another with no downtime.

16

Let’s review a classical high availability storage implementation. As a general approach, storage availability is also offered through a multi-layer redundancy including HBA card teaming, dual switches interconnections, dual storage processors, and redundant disk arrays. Let’s take a quick look at one of the preferred high availability storage architectures. Microsoft Datacenter Services recommends this architecture as optimal for private cloud storage. The key component is redundancy throughout. We have a mesh configuration where every component is speaking to at least two other components. This way any single item can fail, whether it’s one of the two HBAs, one of the two redundant switches, one of the two storage processors, and we can still continue serving our environment. Remember, eliminating any single point of failure is going to be one of the first considerations you have when planning this private cloud infrastructure.

17

As part of your private cloud implementation, we recommend exploring the Microsoft iSCSI Software Target storage technology. In the past, iSCSI Software Target was available exclusively with storage-only offerings from Microsoft. However, for the first time Microsoft has now made it available to the public for free. Simply search the Microsoft website for “iSCSI Target” and you will find it in the Download Center (http://www.microsoft.com/download/en/details.aspx?id=19867). iSCSI Software Target is one of the four different types of SAN solutions which can be provided over Ethernet. If you want to deploy your private cloud, or even design a test infrastructure or implementation, by using the iSCSI Sofware Target you now have a free storage solution. iSCSI Software Target fully supports failover clustering, and iSCSI will be required for guest clustering, which involves clustering virtual machines and making services failover between different VMs. iSCSI Software Target exclusively uses the existing IP network and it can use any type of storage available, including direct attached storage or a storage array. You simply find these disks, add them to the iSCSI target, and then multiple servers can access them. This is a great simple implementation that is extremely cost effective. We’ll see a demo of this deployment in the second module.

18

19

Networking is a critical design component as well. Ethernet is one of the core requirements for the hardware. One of the critical components for high-availability networking, again, is redundancy, eliminating single points of failure. You want at least two of everything, two NICs, two HBAs, two aggregation ports, to provide you the highest availability possible. For Ethernet, you want to have at least 2 gigabit Ethernet (GbE) but 10 GbE is recommended. Of course, the faster the networking, the faster the operations—you’ll have higher performance and be able to access and interact with services much faster. Ethernet link aggregation is also a best practice to use, as this can provide even faster throughput for your networks. Another approach for high availability, in addition to redundancy, is to consider using NIC teaming. NIC teaming allows NICs to have multiple input paths, so if one of those NIC inputs becomes unavailable traffic is rerouted to another NIC. This practice maintains high availability yet again, by eliminating another single point of failure. One best practice as you’re deploying in virtual networks across all of your servers, but particularly across the clusters, is to use identical virtual network names. This way a VM can perform the same and access the same network whether it is hosted on server one or server sixteen in the cluster. Additionally, consider using a virtual local area network (VLAN). It allows you to have a layer of translation between the physical local area network by using a virtualization technology to abstract where the physical network lies and create mini networks. This can be ideal for server configuration and automating operations simply by reducing a lot of the variables as far as where a server is located. Using VLANs, you can stretch your networks across multiple servers or even across multiple data centers and have everything communicating using the same protocols on the same networks.

20

There are also some software requirements to consider as we look at this networking layer. First of all, you’re generally going to be working with third party OEMs for networking components. You will still want to have great integration with all the Microsoft technologies, even though you’re no longer using our hardware. So one best practice is to make sure that Microsoft PowerShell supports all of the networking equipment that you use. PowerShell enables automation in your management, which can help with deployment, scripting, and immediately and automatically responding to any type of problem. Consider using DHCP in your software lab. Not only is this required for a preboot execution environment (PXE) boot, but also it’s ideal for dynamic provisioning. For example, when a particular VM or service is becoming over-utilized, you want to be able to detect this and automatically start another virtual machine to respond to this oncoming increase in traffic. Using DHCP, you no longer have to worry about managing IP addresses for any of these dynamically provisioned servers. You can simply assign a DHCP address and then when the server comes online it will register that DHCP address in the DNS and clients can access that server. Much of the movement to the private cloud is about reducing repetitive tasks, and one of these repetitive tasks would be IP address management. Using DHCP, it’s one less thing that you have to worry about. For networking security, make sure that you use Microsoft Active Directory® directory service to take advantage of all of its built-in security features. Windows Server Update Service (WSUS) will reduce your network traffic further. Think of it as centralizing this deployment model. Rather than having every server in your environment connect over the Internet to a Microsoft update server—go to one place that’s far away—you can have every server in the environment go to a local deployment of WSUS, reducing the total amount of network traffic. Additionally, consider deploying Remote Desktop Services. It will give you the best access to all of your virtual machines across your entire infrastructure.

21

Next let’s consider IPv6 support. IPv6 is the protocol of the future. As you may have heard by now, the number of remaining IPv4 addresses is limited. It’s a best practice to start using IPv6 in your private cloud deployment today so you don’t have to worry about a switchover several years down the line, which could cause a significant amount of problems or service downtime. IPv6 is fully supported by Microsoft. Every team is required to test and support it, including all of the Windows Server products, Hyper-V Server, and Microsoft System Center. However, you also need to ensure that all of the hardware and all the networking components in your environment that do not come from Microsoft support IPv6. Check with your OEM to ensure that this protocol currently is and will continue to be supported in your environment.

22

Let’s talk a little more about the software that we need now.

23

As you plan your host operating system, you primarily have three different choices to consider. The differences between each of these are the number of guest licenses, the licensing terms for the actual servers, and the available management options. You may notice that Windows Server 2008 R2 standard and web edition are not listed here. Those servers do not support clustering, and high availability is a critical component of every private cloud implementation. First let’s take a look at the Hyper-V Server. We’ll cover the advanced feature set of Hyper-V in a little more detail, but it is essentially a free and stripped down operating system. You can actually download it today purely to use as a virtualization host. The reason why it’s not branded Windows Server is because it does not include all of the additional server roles and features that a traditional Windows Server would have, such as Active Directory and DHCP. Hyper-V Server purely focuses on Hyper-V technology. Because it is free, it does not include any licenses for any VMs that are running. So to run a Windows client in that VM you still need to license it. However, you could certainly use any type of free licenses, volume licenses, or open source software inside that guest operating system. Windows Server 2008 R2 Enterprise Edition comes with a license for the host and also licenses for the guest VMs. So you essentially get five licenses in one. However, for a private cloud you really don’t want to be restricted by any type of licensing consideration. So we recommend going for Windows Server 2008 R2 Datacenter Edition, which includes the license of the host and unlimited licenses for VMs. This means that you have a lot more flexibility with dynamically provisioning, scaling out, or even just having elasticity in your environment. You no longer need to worry about mapping guest licenses to VMs, as you have an unlimited number. Looking at the actual host operating system, as we know Hyper-V Server is free. Windows Server 2008 R2 Enterprise Edition is licensed at the server level, and Datacenter Edition at the CPU level. As for management, Hyper-V Server can be managed directly only by Windows Server Core, which means there is no graphical user interface directly on Hyper-V Server. Having said that, you still have the ability to remotely connect to the Hyper-V Server and manage it with a full GUID using remote server administrator tools, managing from another server using Server Manager, or even using System Center Virtual Machine Manager. Of course Enterprise Edition and Datacenter Edition include both a full installation and the option to use just Server Core.

24

Let’s talk a little bit more about the Hyper-V Server. As we mentioned, it is a free virtualization host. You can go to Microsoft.com/hvs to get additional information and links to download it today. Because it’s built on Server Core, it has a reduced attack surface, meaning that a lot of additional components are not included, such as Internet Explorer. By having fewer components, there are fewer items that need to be serviced and patched, hence there will be fewer required updates for this operating system. So it has a lot of additional built-in resiliency compared to a full edition of Windows Server. Hyper-V Server also comes with a wealth of high availability features. It supports 16 node failover clustering, live migration, and cluster shared volumes. So you still are able to use the Hyper-V Server as a core piece of your clustered fabric and your private cloud foundation. Although it is command-line based, Hyper-V Server provides an easy to use menu every time you launch it. Using a simple numerical set of inputs, you can change the domain, configure the computer name, adjust Windows Update settings, toggle the network settings, change the date and time, and even enable clustering or remote desktop connections.

25

Security is one of the core planning considerations that we need to think about upfront as well.

26

27

Active Directory is a critical component of the private cloud deployment. It provides a common access across physical servers, virtual machines, and even extends up to the cloud using a single identity store. Active Directory is a requirement simply because it ensures that the entire platform is highly secure. Remember, Active Directory can manage not just computers and servers, but also how users interact with the servers.

More importantly, Active Directory is an integrated solution covering almost every Microsoft product. You have Active Directory in the data center, which can give you identity management and federation. It can help with remote access and data protection. It provides a public key infrastructure and even checks to ensure that your network is protected and that you have the appropriate anti-malware deployed. It helps secure the desktop by ensuring appropriate security updates are available, and it helps with remote access and even provides additional protection on the web. As we move to the cloud, we think about additional identity management and security management policy, we think about federation between multiple data centers or multiple companies. And finally as we look at the productivity for our end user, Active Directory still helps us out with securing remote access, rights management, and protecting all of our data by having mail filtering and protection.

28

Let’s extend this into the private cloud and look at some of the best practices for our fabric management. First we recommend that all physical servers be members of the same forest. Ideally they’re in the same domain, to really simplify the communication and avoid additional complications of having federated services, but as long as they’re in the same forest you’ll be able to manage them successfully. We recommend having a specific organizational unit (OU) tree and a primary root OU for this infrastructure as a service (IaaS) implementation, Additionally we recommend having one OU for each resource call for the host server accounts. A resource call might be a group of servers or a group of storage networking architecture such as routers or network load balancers. We recommend having one OU for the fabric management of the servers and the service account. We recommend having one OU for your security groups that then provide access to tenants. So the administrators that are granting access for the virtual machines should have limited rights as well. Try to break down the rights and the roles to ensure that someone doesn’t have too many elevated privileges and could do something malicious in the enterprise. By having designated organizational units for function and role, you can specify what a person should be doing and make sure that they stay focused on that task and don’t interact with anything that’s beyond their control. We recommend having dedicated group policy objects for these server configurations as well. By having the group policy objects, you can make sure that any new servers or VMs that join your enterprise get these policies pushed down to them immediately. This way you can make sure that they’re highly secure and compliant as soon as they come online. We recommend having at least two domain controllers as well, and it’s best to have them running on bare metal as opposed to inside VMs just to reduce the amount of translation that must be done between the physical layer and the virtual layer. Having said that, it is actually fine to run them in VMs, but if you do so, make sure that the domain controller is not on a cluster, or make sure that all of the VMs are not on a cluster. This is because when a cluster starts up it needs access to Active Directory, but if your domain controllers are in clustered VMs those are not available. That would result in a startup service lock where you can’t access the domain controllers because they haven’t yet been started and you can’t start up the VMs that contain those domain controllers because they have not yet been given access. So the best practice is to make sure that you have at least one domain controller on bare metal.

29

Let’s look a little more closely into why Active Directory is such a critical component. The primary reason we use Active Directory is that it’s integrated everywhere. It provides us our identity management, our physical management layer, and covers applications, the operating, the hypervisor, the hardware, and the network. And these translate to Microsoft products with System Center, our primary server applications, Windows Server and Windows client, Hyper-V virtualization, our servers and personal computers, and of course our network.

30

31

For the network, Active Directory can provide server and domain isolation. If you have a group of production servers and a group of test servers, you have the ability to isolate them by function to ensure that unnecessary traffic or communication does not happen between these groups of servers. Active Directory also provides network access protection, which prevents unauthorized people from coming into the network and doing anything intentionally malicious. It manages the firewall settings for all of your host servers, providing through group policy an easy way to ensure that each server is deployed correctly and secured correctly. And additionally its functionality is extended to partners, so even non-Windows host servers can still use some of the benefits from Active Directory when they’re being accessed.

32

Using Active Directory, all new hardware can immediately be provisioned and have a group policy applied so that it’s highly secure and it has all of the right settings. Active Directory further enables System Center Configuration Manager to easily find newly deployed servers so that it can add them to the Configuration Manager resource pools. That ensures the new server from now on always gets the most up to date patches and other requirements.

33

For virtualization, Hyper-V uses Active Directory for all of the administration. Hyper-V has a security feature called Authorization Manager that allows you to delegate specific tasks and roles to a scope of people or to a scope of servers. Using Active Directory and Authorization Manager, you can say that one group of admins only has access to a certain set of servers, another group of admins might be only self-service users that have the ability to manage but not create VMs. Active Directory gives you this granularity to help spread out and isolate each server role for each server user.

34

For the core Windows component, Active Directory lets you enable and prevent logins to certain machines. It provides delegated management for managing applications and services, and helps enable partners to use non-Windows machines.

35

Many applications actually use Active Directory as their identity store. So when you try to connect to a server it’s going to check in Active Directory. If you try to manage a server it’s going to authenticate you. By providing this common login and user information, Active Directory gives you extensive management across not just the servers but the applications as well. Using Active Directory, we can even extend these management principles to Office 365 and Windows Azure by our federated services.

36

Microsoft System Center really encapsulates many of these components by providing our broad management layer. System Center is going to be one of the critical pieces of your private cloud solution as it enables this automation and broad multi-server management. Any time you deploy an application using System Center Virtual Machine Manager, App V, or System Center Configuration Manager, you can use Active Directory to ensure that only the appropriate machines are given access to these new services and that only the appropriate users can run these new services.

37

Finally we look at the broader identity management using Microsoft Forefront Identity Manager (FIM). This is essentially the private cloud implementation, the broad multi-server, multi-data-center implementation of Active Directory. Forefront Identity Manager is the Active Directory management solution from the cloud that allows easy provisioning and deprovisioning. It allows synchronization across multiple environments or data centers. It provides the ability to have workflows with approvals, simplifying the approval process for when new people join the organization and request services. And it provides end to end management of all groups using any type of application, operating system, or virtual machine.

38

Now that we have our core security infrastructure in place, we’ll look a little more into some of the more advanced deployment planning. Specifically we’re going to look at using high availability.

39

High availability throughout the stack is required, not just for your services and software, but also need for the hardware itself. • Having a single data center is great, but what happens if there is a regional power outage or some type of natural

disaster? Having multiple data centers is an ideal solution for a private cloud implementation so that if one data center is unavailable, services could continue to run in a different physical location.

• For network high availability, think about having hardware load balancers. Think about using NIC teaming to distribute incoming connections to a NIC. Think about having redundant networks everywhere, and redundant network infrastructure such as switches and routers.

• For power, make sure that you have backup generators or batteries so that if there is a power outage you can quickly restore services and keep them up and running, even though there may be some type of performance degradation.

• We’ve already covered some of the high availability requirements for the storage area network. We specifically encourage you to look at using iSCSI Software Target as this can provide high availability for both physical clusters as well as guest clusters, which are VMs that are running in parallel as a cluster. As you look at storage, don’t forget to use RAID and multipath IO, and consider using mirroring or replication as another high availability method. With mirroring and replication you’re taking a copy of the data and moving it to a different location. If the primary location or server crashes, you still have the ability to access that data and to restore it in a different physical location.

Software high availability uses several technologies as well. • The most traditional is backup and restore, which is a similar model to the replication and mirroring: taking a

snapshot of the data, and if needed, bringing it back online. • Disaster recovery planning involves integrating a series of operations to recover all of these solutions once there’s a

crash. So for example if you lose the data center, as soon as the disaster recovery technologies detect that event, you can trigger a series of events to find the lost data, restore it in an ideal location, and reconnect clients to that server to minimize the service interruption.

• High availability with your networking infrastructure uses DNS round robin, which allows you to direct clients to multiple identical servers through DNS.

• Failover clustering will be highlighted a little further in this section. • IIS application request routing (ARR) gives you load balancing at the software level, or level seven or http-type

requests. • Log shipping is a technology similar to mirroring which allows you to pass data logs to different locations so that you

can quickly restore services. • Software mirroring is functionally similar to the storage mirroring we’ve already discussed. • Windows network load balancing provides the distribution of traffic to multiple back-end servers for level two and

level three traffic. The advantage of load balancing is that it can distribute the load so that a particular server is not overwhelmed.

• The recycle bin is available for a few technologies such as Active Directory, giving you the ability to recover objects even up to 30 days after they were deleted.

• Finally there’s traditional replication as well, which allows you to make copies of your data and move them into a different location so that we can restore them as needed.

40

As you can see there are a broad number of high availability technologies, both on the hardware level and on the software level. But we’re going to focus on failover clustering as our primary high availability solution for the private cloud. Failover clustering provides VM high availability by detecting the health of not only the VMs but also the operating systems running inside the VMs. If there’s a user mode hang or a blue screen, you can immediately notice this and then take corrective action such as restarting that VM on the same host or even moving it to a different host. We have application and service health detection as well, by using a concept known as guest clustering which clusters your VMs and allows a service to move freely between the VMs. Additionally with failover clustering, when there’s a crash, the recovery is automatic. You no longer have to depend on your IT admin being alerted by pager when there’s a crash. Failover clustering will immediately and automatically recover any type of service. Beyond high availability, failover clustering also provides VM mobility, which lets you move the VM to a different physical server. This is important in a private cloud implementation, because we want to abstract the physical layer from the services and the VMs. From an end user’s perspective, they don’t care whether their VM is running on node 1 or node 16, they just need it to function. By having the flexibility of moving VMs to different hosts, we have a lot more management opportunities on the physical layer. For example, we can patch the entire physical infrastructure without affecting any of the VM services, simply by moving VMs over to different servers in the cluster which are not being patched at the time. However, downtime is unavoidable. Downtime can happen for standard maintenance, performing an upgrade with the software or even the hardware, or performing an update with a hotfix or a security patch. And of course you need to protect against unplanned downtime such as accidents, power outages, or natural disasters. So it’s very important to consider high availability and failover clustering as you’re planning this private cloud implementation, to minimize the effects of any of these types of downtime.

41

Failover clustering is not just designed for Hyper-V virtualization. It actually supports almost every single Windows Server feature and role, the most common ones being SQL Server, Exchange Server, and File Server, in addition to Hyper-V. Most of the Windows Server inbox features and roles are supported, as they have custom resource DLLs which tell this particular workload how to interact with the cluster and how to perform in this multi-server environment. If that still doesn’t meet your needs, failover clustering is an extensible platform. You can either write custom scripts or use generic containers to deploy almost any type of service or application on the failover cluster. Using custom scripts, you’re essentially writing a translation LAN(?), the resource DLL, which maps cluster operations such as bringing a service online to specific operations for that workload such as net-start ADCS. By having this mapping, the cluster can automatically manage all of these customized components, and this can include third party applications or line of business applications. However, many people do not have the time or skills to write their own resource DLL. For this reason, failover clustering also provides generic containers. Generic containers are essentially wizard-based implementations that allow you to enter several pieces of information and automatically cluster any type of executable application, any script, or any Windows Server machines to have automatic high availability as well.

42

Let’s take a brief look at how clustering actually works. In this model we have a two-node cluster. All of our data is on shared storage. Remember, shared storage is going to be an important requirement in the private cloud so that any node can access the data. The node on the left is currently hosting several highly available roles. It looks like it has a virtual machine, a file server, some networking components, and a database, perhaps SQL Server. In this case our clients are accessing the services which are hosted on the server running in the node on the left. All of the data for those particular applications is on the shared storage. In the middle of this diagram is another network with a heartbeat, which is our health network that allows the different nodes in the cluster to ensure that each one is up and running. Using this health network, we can detect when one node is unavailable and then move those resources it was hosting to another node. Throughout the cluster is a cluster database, which is kept in an HKOM under a cluster key. This cluster database in the registry contains a detailed list of every resource in the cluster, which node is hosting each, and all the properties of each resource. This database is also replicated and kept consistent across every node in the cluster. And so at any point in time every node knows what every other node in the cluster is doing. Now keeping this in mind, we’re going to crash our first node. Let’s say there’s a hard crash, there’s a blue screen, it’s unexpected, unpredictable. The clients can no longer access the server, hence they can no longer access the services and the services go offline. At this point our health detection is going to kick in, and the second node realizes that the first node is no longer responsive. The second node will look at its own cluster database and figure out what the first node was hosting. The second node knows that the first was hosting a VM, a file server, networking components, and a SQL database. Knowing that these services are no longer being provided, this new server will start up those services, access the application data for those services on the shared storage, and then allow the clients to reconnect, providing a route for the users to the service and the data that they need. This is an unplanned failover where a disaster has happened, but we have quickly and automatically recovered from it.

43

Clustering has a few core requirements. We’ll look at the networking first. It’s important to know that failover clustering is fully integrated with the Windows Server TCP/IP stack, it just works. When you deploy a cluster, there’s a validation test that you can run to ensure that all of the best practices are being followed and the solution will be supported. However, when you deploy this cluster you do need multiple networks for added redundancy, and depending on what the cluster is being used for you may need three, four, or even six networks in each cluster node. The first network will be dedicated for the public to access the data. You want to keep this network isolated since it potentially could be exposed to the Internet. By keeping public traffic isolated from the other networks, it limits the ability of any type of malicious attack, such as a denial of service attack through this network. The second network that you want to have is dedicated for cluster communication. This can include the heartbeats that we saw in the last example. Keep in mind that having this health network separated is a best practice, because when you’re doing your health checks, the cluster nodes are pinging each other. However, if there is too much traffic in that network, it could prevent some of the pings going through and potentially trigger a false failover. By having a dedicated network for this cluster traffic and cluster shared volumes traffic, we can help ensure that we never trigger a false failover. As we extend this cluster to use Hyper-V virtualization, we recommend that you have two dedicated networks for Hyper-V. The first is going to be used for live migration traffic. Live migration is the process of moving a running virtual machine from one physical host to another with no downtime. But the way you do this is by pushing a lot of data through the network as fast as you can. Remember, if you’re flooding the network, that could affect other communication going through that network, so it’s best to isolate this live migration network. Additionally, you want to have a dedicated network to manage the Hyper-V host itself. That lets you access the host as the administrator, but then perform management operations such as copying a VHD file from a library to that host. Now if you’re using one of the Ethernet protocols for your storage area network, which could be for the iSCSI protocol or the Fibre Channel over Ethernet protocol, you’ll want to have a dedicated NIC for this as well. Again this is because data traffic could be network-intensive and you don’t want that to affect any of the functions or operations of the other networks. Finally, if you’re using backup for the host or the guest running on that host, such as a third-party backup provider or Microsoft System Center Data Protection Manager 2010, we again recommend having a dedicated network for the same reason: you want to isolate the traffic so that it does not affect anything else. The good news with clustering is that it supports all of the IP protocols, IPv4, IPv6, and IPv4-to-IPv6 tunneling technologies. Additionally, you have the flexibility of using either DHCP or static IP addresses, not just for the cluster but for anything that’s running on the cluster, whether that’s a file server or a VM. As we look at some of the storage requirements, we have mentioned a few times that you are required to use a storage area network so that any node can access the data at any time. You have the four choices, Fibre Channel, Fibre Channel over Ethernet, iSCSI, or serial attached SCSI. One of the critical requirements is that it must support a SCSI-3 command. This is simply the mechanism that the cluster uses to ensure that one disk can be accessed and no other disks can steal that. Your storage also needs to support what are called persistent reservations, but don’t worry about the details. Remember that clustering has a built-in validation test best practice analyzer, so it’s going to make sure that all of your storage works as part of those tests. Clustering does support either GUID partition

44

table or master boot records as the type of disk, and fully supports multipath IO. Additionally, any time you have Hyper-V, you want to use cluster shared volumes, which are the distributed access system for Hyper-V. Let’s take a look at how cluster shared volumes work in a little more detail.

44

With cluster shared volumes, you give every node the ability to access the same LUN at the same time. This is managed by having one node control access to the file system itself, and any other node can access and write to the raw blocks of data within that disk. Any time that a node does write to raw blocks of data, it doesn’t have the risk of actually interfering with or ruining the integrity of the file system. However, any time that you need to make an update to the file system itself—such as extending it, shrinking it, starting a VM, stopping a VM, renaming a VM, any time that you change this file system’s metadata, how it’s laid out, how it’s structured—that is all done through a control mechanism. When you look at the traffic that comes from a VM, you determine whether this traffic will be just a simple read and write, changing ones and zeroes to a block of data, or a change to the structure of the file system. If it changes the structure of the file system, it will go through one particular node which is named the coordinator node. The coordinator node ensures that the file system integrity is kept, and all of the other servers are then able to just perform raw read and write actions to ones and zeroes. The first benefit is that by using cluster shared volumes, you can pass your data over multiple networks, so if one particular network crashes or becomes unavailable it’s not a problem. You can simply reroute data over another network. And remember, if you’re using a Hyper-V cluster you’re probably going to have more than just these three networks that you see in the example. If you think about having dedicated networks for cluster shared volumes, live migration, public traffic, private traffic, you actually have multiple paths that you can reroute traffic through if one of them becomes unavailable. As we mentioned, every node can now access the storage, which differs from traditional clustering. With traditional clustering, let’s say with a file server, only one node can access the storage at a time, and if that file server needs to move then that node crashes, and this storage will also move with it. With cluster shared volumes, we provide simultaneous access to the data, so that a particular disk is no longer bound to a particular host. That gives us full flexibility in how we move VMs throughout our environment, as the VM and anything else that’s running on the same disk as that VM are now completely independent and free to move. Additionally, we have some benefits from consolidation gains. With traditional clustering, let’s say with our file server example, we were bound to use one disk for each service. If we were running let’s say 100 file servers or 100 VMs with traditional clustering, we had to have 100 LUNs, and that introduced a lot of storage area network management complexities. But providing multiple access for multiple services on a single disk significantly simplifies the storage management. We no longer have to worry about particular services being bound to particular disks as many services or many VMs can now run on a single LUN.

Finally, let’s explore virtualization and understand some of the critical planning considerations with virtualization in the private cloud.

46

First of all, let’s think about virtualization at multiple layers and how we provide high availability for it. Up to now we’ve been talking about host clustering, which is where we actually cluster physical servers and we can move the VMs or services from one physical host to a different physical host. This gives us the ability to patch a physical host, but not affect any guest which is running on top of it. We also have the ability to use Fibre Channel, FCoE, serial attached SCSI, or iSCSI with this model. However, we have the ability to do what’s called guest clustering as well, which we’ve alluded to a few times already. With guest clustering we’re actually clustering the VMs themselves, so the VMs are running Windows Server Enterprise Edition or Datacenter Edition, and then there are services running inside those VMs. Those particular services inside the VMs, for example a SQL database inside the VM, now have the ability to move to a different VM to minimize the amount of downtime. This is ideal in scenarios where you have to patch the guest operating system itself. Remember, if you’re patching the host operating system you can move a VM to a different host. If you’re patching a guest operating system you need to move the service running inside that guest to another host for this additional downtime. However, one caveat with using guest clustering is that you must use iSCSI, as there are no virtualized HBAs for the other storage protocols.

47

One of the most common questions we get is about combining host and guest clustering, whether you can mix physical servers with guest VMs as long as they are part of a cluster. The answer to this is yes, as long as these pass the built-in validation tests. When you’re mixing both the physical clustering and the guest clustering, you actually have the best of both worlds’ high availability. You have high availability at the physical layer for maintenance on the physical host, and then you have high availability of the virtual layer for keeping any type of virtualized service online and running at the same time.

48

As we think about using virtualization and running a lot of VMs on cluster nodes, we need to think about planning the density for the host itself. A cluster supports up to 1,000 VMs, and this could be any size cluster, from four nodes up to 16. But remember when you deploy these VMs, it’s best to spread them out across all of the hosts, so you’re not overwhelming a few particular hosts. Clustering uses an active-active model, meaning that every server, if you want, can run a workload. There is no requirement to have one or more servers sitting completely passive. However, when you plan this density you also need to keep in mind some of the Hyper-V limits, which include 384 VMs per host or per node, as well as 512 virtual processors per host or per node. But don’t just think about the initial capacity that a VM or host needs to haul, think about what happens when there’s a crash and one of the nodes in your cluster becomes unavailable. All of those VMs that are running on the host which has just crashed must be spread to all of the other nodes throughout the cluster. You have to consider the additional capacity that a VM or host may need to take on when there’s a crash, so that it can continue maintaining the availability for all of the VMs throughout the environment. Additionally, when you’re thinking about having 1,000 VMs, consider what you’re actually doing inside each VM. One thousand might be an achievable number if you’re doing minimal work inside the VM itself, if you have thin clients running VDI sessions for example. However, if you have highly data intensive workloads running inside the VM, such as a SQL database, you’ll want to test that well, because SQL uses a lot more IO and memory, and so it’s going to use additional host resources. When you design this implementation, test with production workloads running inside the VM to make sure that not only the host but also the networking can handle the throughput to your storage area network and all of those IOPs per second.

49

When you think about clustering and virtualization, remember you can run just about anything inside a Hyper-V virtual machine. All of these servers and services are supported components using Hyper-V. When all of the Microsoft engineering teams work on projects, there is a hard requirement that everything must be tested not only on bare metal but also running in a virtualized environment on Hyper-V. Every new Windows Server feature, component, and workload will run in a Hyper-V virtual machine. If you want more information about all of the workloads that are supported, check out KB article 957006, which is kept constantly up to date with all of these supported Hyper-V workloads.

50

Finally, a quick reminder about the Microsoft Private Cloud Fast Track. As you see we have covered a lot of information as far as core private cloud infrastructure planning, but if you want to simplify your process and your acquisition of this particular hardware and series of technologies, consider working with one of the Private Cloud Fast Track partners for an optimized reference architecture. This architecture includes all of these different components for compute, storage, and network, it includes all the configurations that you need, and it can be made highly available and support virtualization as soon as you get the product. So if you want to skip a lot of these complex steps with the planning, consider using the Microsoft Private Cloud Fast Track as a key reference architecture to give you several choices to immediately have you deploying and getting your private cloud up and running.

51

As we wrap up, we want to quickly review that there are critical components throughout the fabric that you need to think about for planning. You have the servers, the storage, and the networking, as far as the physical hardware layer. You have the software and the security as far as the infrastructure layer. And then you want to add high availability and virtualization as some of your more advanced features. Our upcoming modules are going to look at more of the configuration and deployment as well as the management of this private cloud infrastructure.

52

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Active Directory, BitLocker, BizTalk, Excel,

Forefront, Hyper-V, Internet Explorer, Lync, Microsoft Dynamics, PerformancePoint, SQL Server, Visual

Studio, Windows, and Windows Server are registered trademarks and/or trademarks in the U.S. and/or

other countries.

The information herein is for informational purposes only and represents the current view of Microsoft

Corporation as of the date of this presentation. Because Microsoft must respond to changing market

conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft

cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN

THIS PRESENTATION.

53

Documents

PC Infrastructure - 1 - Overview.pdf