Cisco Solution for EMC VSPEX with ScaleIO Platform · The VSPEX solution for VMware vSphere with EMC ScaleIO validates the configuration for a specified number of virtual machines

© 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page 1 of 30

White Paper

Cisco Solution for EMC VSPEX with ScaleIO Platform

On Cisco UCS and VMware vSphere 5.5

Shivakumar Shastri

December 2014

This white paper describes the EMC VSPEX virtualized infrastructure solution on Cisco UCS, EMC ScaleIO, and

VMware vSphere 5.5.

Acknowledgments

The author would like to acknowledge the following for their support and contributions:

Rajmohan Rajanayagam, Principal Solutions Engineer, EMC

Tripp Bridges, Solutions Engineer, EMC


Contents

Introduction .............................................................................................................................................................. 3 Executive Summary .............................................................................................................................................. 3 Business Objectives .............................................................................................................................................. 3 Target Audience .................................................................................................................................................... 3

Solution Overview.................................................................................................................................................... 4 Architecture ........................................................................................................................................................... 4 Key Components .................................................................................................................................................. 7 High Availability ................................................................................................................................................... 11

Solution Details ...................................................................................................................................................... 12 Virtualization ....................................................................................................................................................... 12 Compute Layer ................................................................................................................................................... 13 Network Layer ..................................................................................................................................................... 16 ScaleIO Software ................................................................................................................................................ 18

Sizing ...................................................................................................................................................................... 21 Reference Virtual Machine and Workload ........................................................................................................... 21 Scale Out ............................................................................................................................................................ 21 Validated Building Blocks .................................................................................................................................... 21 Configuration Guidelines ..................................................................................................................................... 23

Deployment ............................................................................................................................................................ 25

Test and Validation ................................................................................................................................................ 27 Post-Installation Checklist ................................................................................................................................... 27 Failure Testing .................................................................................................................................................... 27 Monitoring ........................................................................................................................................................... 29

References ............................................................................................................................................................. 30


Introduction

This document provides guidance on the technical aspects of an integrated infrastructure solution using EMC

ScaleIO software-defined storage on Cisco Unified Computing System™

(Cisco UCS®) and VMware through the

VSPEX program. VSPEX provides modular solutions built with technologies that enable faster deployment, greater

simplicity, greater choice, higher efficiency, and lower risk. This white paper provides a complete system

architecture capable of supporting virtual machines with fault-tolerant server and network topology and highly

available ScaleIO software.

Executive Summary

An integrated infrastructure brings together disparate compute, network, and storage products into a cohesive, pre-

validated solution. In most cases, this approach eliminates compatibility and deployment hurdles but does not

address operational inefficiencies stemming from complex features of the underlying components. One approach

to simplifying management of such an integrated stack is to introduce an orchestration layer for automation. Hyper-

convergence presents a simpler and more granular approach to integrate widely available processing nodes such

as rack servers without introducing operational complexity. This method allows for a more flexible, agile, and

efficient infrastructure in which capacity can keep pace with demand at a lower price point. ScaleIO presents a

hypervisor-agnostic software-defined approach to serving up compute and storage resources from a cluster of

nodes for consumption by workloads within the cluster. The use of Cisco UCS servers with Cisco UCS Manager

adds to operational efficiency through a single pane for firmware and console management of cluster servers.

Further, the architecture with Cisco UCS servers and fabric interconnects is fault tolerant and provides local

switching for traffic between servers within the same domain.

Business Objectives

Business applications are moving into consolidated compute, network, and storage environments. The EMC

ScaleIO platform on Cisco UCS with VMware delivered as a VSPEX proven infrastructure reduces the complexity

of configuring every component of a traditional deployment model. The solution simplifies integration management

while maintaining application design and implementation options. It also provides unified administration while

enabling adequate control and monitoring of process separation. The business benefits for this VSPEX architecture

include:

● Provides an end-to-end virtualization solution to effectively use the capabilities of the unified infrastructure

components

● Provides a proven VSPEX solution on Cisco UCS with VMware for efficiently virtualizing compute resources

for varied customer use cases

● Provides a reliable, flexible, and scalable reference design

Target Audience

The readers of this document must have the necessary training and background to install and configure VMware

vSphere 5.5 and associated infrastructure, including the Cisco UCS server platform, as required by this

implementation. External references are provided where applicable, and readers should be familiar with these

documents. Readers should also be familiar with the infrastructure and database security policies of the customer

installation. Guidance is provided on sizing, configuration, and validation, with references for further reading.


Solution Overview

Architecture

The following is an overview of the VSPEX proven infrastructure platform and the key technologies used in the

solution. The solution has been designed to provide virtualization, server, network, and storage resources, giving

customers the ability to start with a right-sized deployment and scale as business demand grows.

Physical and Logical Architecture

The VSPEX solution for VMware vSphere with EMC ScaleIO validates the configuration for a specified number of

virtual machines. System configuration determines the capacity of each node and hence the overall cluster

workload. The sections that follow show that disk capacity, more than processing, I/O operations per second

(IOPS), or memory, is the limiting factor in the number of virtual instances supported.

Note: VSPEX uses a reference workload to describe and define a virtual machine. Therefore, one physical or

virtual server in an existing environment may not be equal to one virtual machine in a VSPEX solution. Evaluate

your workload in terms of the reference to arrive at an appropriate point of scale.

The solution uses EMC ScaleIO software and VMware vSphere 5.5 on a cluster of Cisco UCS C240 M3 Rack

Servers to provide the storage and virtualization platform. The clusters tested include a minimum configuration with

three nodes and another consisting of seven nodes. The workload consists of Microsoft Windows Server 2012

virtual machines. The ScaleIO cluster is formed by a pair of low-latency 10 Gigabit Ethernet switches to allow the

ScaleIO Data Client (SDC) on each processing node to access storage served up by the remote ScaleIO Data

Servers (SDS). Figure 1 shows a basic configuration of a ScaleIO cluster consisting of a primary and secondary

Metadata Manager (MDM) and the tie-breaker node with access requirements.

Figure 1. Basic Configuration of a ScaleIO Cluster


Figures 2 and 3 depict the configurations validated.

Option 1: Cisco UCS C240 M3 Rack Servers with Cisco Nexus® 5548 10 Gigabit Ethernet Switches (Figure 2)

This is the basic method to ensure low latency for the cluster switch. The Cisco Nexus 5548 Switches provide

redundant 10 Gigabit Ethernet low-latency connectivity between all the nodes within the ScaleIO cluster. External

user access and access to enterprise services such as Active Directory and DNS are through separate VLANs set

up on the Nexus 5548 switches. While these switches provide sufficient bandwidth with the least amount of

latency, there is still the need for a separate management network for console access to servers.

Figure 2. Option 1: Cisco UCS C240 M3 Rack Servers with Cisco Nexus 5548 Switches


Option 2: Cisco UCS C240 M3 Rack Servers with Cisco UCS 6248UP Fabric Interconnects (Figure 3)

The Cisco UCS C240 servers serving as ScaleIO cluster nodes can be connected through the Cisco UCS Virtual

Interface Card (VIC) 1225 to the Cisco UCS 6248UP 48-Port Fabric Interconnects to form a Cisco UCS domain.

This method allows for a converged fabric, eliminating additional cabling for cluster management, compared with

option 1. The arrangement also introduces features and functionality of Cisco UCS Manager, which allows for

single-pane console, firmware, and LAN/SAN management of the servers within the ScaleIO cluster.

Figure 3. Option 2: Cisco UCS C240 M3 Rack Servers with 6248UP 48-Port Fabric Interconnects

Note: Please go to the “Deployment” section, later in this white paper, for details on configuration.


Key Components

This architecture includes the following key components:

● VMware vSphere 5.5: Provides a common virtualization layer to host a server environment. vSphere 5.5

provides highly available infrastructure through such features as:

◦ vMotion: Provides live migration of virtual machines within a virtual infrastructure cluster, with no virtual

machine downtime or service disruption

◦ Storage vMotion: Provides live migration of virtual machine disk files within and across storage arrays,

with no virtual machine downtime or service disruption

◦ vSphere High Availability: Detects and provides rapid recovery for a failed virtual machine in a cluster

◦ Distributed Resource Scheduler (DRS): Provides load balancing of computing capacity in a cluster

◦ Storage Distributed Resource Scheduler (SDRS): Provides load balancing across multiple datastores

based on space usage and I/O latency

● VMware vCenter Server: Provides a scalable and extensible platform that forms the foundation for

virtualization management for the VMware vSphere cluster. vCenter manages all vSphere hosts and their

virtual machines.

● Microsoft SQL Server: VMware vCenter Server requires a database service to store configuration and

monitoring details. This solution uses a Microsoft SQL Server 2012 database.

● Shared infrastructure: Adds DNS (name resolution) and authentication/authorization services, such as AD

Service, with existing infrastructure or set up as part of the new virtual infrastructure.

● Cluster network: A 10 Gigabit Ethernet network with either a pair of redundant Cisco Nexus 5548 Switches

or a pair of Cisco UCS 6248UP fabric interconnects with VIC 1225 adapters in the C240 rack servers for

both management and storage traffic. A shared IP network carries user and management traffic into the

cluster.

● EMC ScaleIO: EMC ScaleIO software creates a server-based SAN from local server storage to deliver

elastic, scalable performance and capacity on demand.

Hardware Resources

Table 1 lists the hardware used in this solution.

Table 1. Hardware Components

Component Configuration

VMWare vSphere servers Cisco UCS C240 M3 Rack Servers:

Config 1: 3 nodes, each with 64 GB memory and 6 x 600-GB 10,000-rpm SAS disks

Config 2: 7 nodes, each with 128 GB memory and 9 x 900-GB 10,000-rpm SAS disks

1 vCPU per virtual machine

Maximum of 4 vCPUs per physical core*

Memory 2 GB RAM per virtual machine

2 GB RAM for each physical server for the hypervisor

3 GB RAM reservation for each ScaleIO virtual machine (SVM)

Network 2 x 10 Gigabit Ethernet NICs per server

3 GB RAM and 2 vCPUs for each ScaleIO SVM


Network infrastructure 2 x Cisco Nexus 5548 switches [or] UCS 6248UP fabric interconnects

2 x 10 Gigabit Ethernet ports per VMware vSphere server

Shared infrastructure In most cases, the customer environment will already have infrastructure services such as Active Directory and DNS services configured. The setup of these services is beyond the scope of this document.

If implemented without existing infrastructure, the minimum requirements are:

● 2 physical servers

● 16 GB RAM per server

● 4 processor cores per server

● 2 x 1 Gigabit Ethernet ports per server

*For Intel® Xeon

® or later processors, use 8 virtual CPUs per physical core.

Note: Add at least one additional server to the infrastructure beyond the minimum requirements to implement

VMware vSphere High Availability functionality and to meet the listed minimums.

Software Resources

Table 2 lists the software resources for the solution.

Table 2. Software Resources

Software Configuration

Software Configuration

vSphere Server Enterprise Edition

vCenter Server Standard Edition

Operating system for vCenter Server Microsoft Windows Server 2012 R2 Standard Edition

Microsoft SQL Server Version 2012 R2 Standard Edition

ScaleIO 1.3

ScaleIO virtual machine ScaleIO virtual machine release 1.3

Metadata Manager (MDM)/tie breaker ScaleIO components release 1.3

ScaleIO Data Server (SDS) ScaleIO components release 1.3

ScaleIO Data Client (SDC) ScaleIO components release 1.3

Virtual Machines (for validation, but not required for deployment)

Base operating system Microsoft Windows Server 2012 R2 Data Center Edition

Virtualization Layer

The virtualization layer is a key component of any private cloud solution. It decouples the application resource

requirements from the underlying physical resources that serve them. This enables greater flexibility in the

application layer by eliminating hardware downtime for maintenance, and allows the system to physically change

without affecting the hosted applications. It enables multiple independent virtual machines to share the same

physical hardware, rather than being directly implemented on dedicated hardware.

VMware vSphere 5.5: VMware vSphere 5.5 transforms the physical resources of a computer by virtualizing the

CPU, RAM, hard disk, and network controller. This approach creates fully functional systems with dedicated and

isolated instances of operating systems and applications, similar to physical computers.

The high-availability features of VMware vSphere 5.5, such as vMotion, enable seamless migration of virtual

machines and stored files from one vSphere server to another, with minimal or no performance impact. Coupled

with vSphere DRS, virtual machines have access to the appropriate resources at any point in time through load

balancing of compute resources.


VMware vCenter: VMware vCenter is a centralized management platform for the VMware virtual infrastructure.

This platform provides administrators with a single interface for all aspects of monitoring, managing, and

maintaining the virtual infrastructure, accessed from multiple devices.

VMware vCenter also manages some advanced features of the VMware virtual infrastructure, such as VMware

vSphere High Availability and DRS, vMotion, and Update Manager.

EMC ScaleIO

EMC ScaleIO is a software-defined storage solution that uses local disks and LANs to create a virtual SAN with all

the benefits of external storage, but at reduced cost and with less complexity. The lightweight ScaleIO software

components are installed in the application hosts and intercommunicate using a standard LAN to handle the

application I/O requests sent to ScaleIO block volumes. An extremely efficient decentralized block I/O flow,

combined with a distributed, sliced volume layout, results in a massively parallel I/O system that can scale to

thousands of nodes.

ScaleIO was designed and implemented with enterprise-grade resilience as a requirement. Furthermore, the

software features efficient distributed automatic healing processes that overcome media and node failures without

requiring administrator involvement. ScaleIO enables administrators to add or remove nodes and capacity “on the

fly.” The software immediately responds to the changes, rebalancing the storage distribution and achieving a layout

that optimally suits the new configuration. Because ScaleIO is hardware and hypervisor agnostic, the software

works efficiently with various types of nodes and a mix of types within the same cluster.

Software Components: The ScaleIO virtual SAN software consists of three software components:

● Metadata Manager (MDM): Configures and monitors the ScaleIO system. The MDM can be configured in a

redundant cluster mode, with three members on three servers, or in single mode on a single server.

● ScaleIO Data Server (SDS): Manages the capacity of a single server and acts as a back end for data

access. The SDS is installed on all servers contributing storage devices to the ScaleIO system.

● ScaleIO Data Client (SDC): SDC is a lightweight device driver situated in each host whose application or

file system requires access to the ScaleIO virtual block devices. The SDC exposes block devices

representing the ScaleIO volumes that are currently mapped to the host.

Software Architecture: ScaleIO consists of two major functional components: the ScaleIO Data Client (SDC) and

the ScaleIO Data Server (SDS). The SDC is a block service driver that exposes ScaleIO shared block volumes to

applications. The SDC runs locally on any application server that requires access to the block storage volumes in

the cluster. The SDS is a software component installed on each server that contributes local storage to the overall

ScaleIO storage pool. The SDC serves incoming read and write requests from any of the SDCs within the cluster. It

possesses full knowledge of the data locations throughout the cluster and always directs I/O requests to their

correct destination SDS, whether on the same server or another server. Because the same hosts run applications

and provide storage for the virtual SAN, SDC and SDS are typically both installed on each of the participating

hosts, such as a rack server, as shown in Figure 4.


Figure 4. Layout of SDS and SDC

Designed and implemented to consume a minimum amount of computing resources, the ScaleIO software

components have a negligible impact on the applications running on the hosts.

Pure Block Storage Implementation: ScaleIO implements a pure block storage layout. The entire architecture

and data path are optimized for block storage access needs. For example, when an application submits a read I/O

request to the SDC, the SDC instantly deduces which SDS is responsible for the specified volume address and

then interacts directly with the relevant SDS. The SDS reads the data (by issuing a single read I/O request to the

local storage or by just fetching the data from the cache in a cache-hit scenario), and returns the result to the SDC.

The SDC provides the read data to the application.

This implementation is very simple, consuming as few resources as possible. The data moves over the network

exactly once, and a maximum of only one I/O request is sent to the SDS storage. The write I/O flow is similarly

simple and efficient. Unlike some block storage systems that run on top of a file system or object storage that runs

on top of a local file system, ScaleIO offers optimal I/O efficiency.

Massive Parallel, Scale-Out I/O Architecture: ScaleIO can scale to many nodes, thus breaking the traditional

scalability barrier of block storage. Because the SDCs propagate the I/O requests directly to the pertinent SDSs,

there is no central point through which the requests move—and thus a potential bottleneck is avoided. This

decentralized data flow is important to the linearly scalable performance of ScaleIO. Therefore, a large ScaleIO

configuration results in a massively parallel system. The more servers or disks the system has, the greater the

number of parallel channels that will be available for I/O traffic.


Mix-and-Match Nodes: The vast majority of traditional scale-out systems are based on a symmetrical brick

architecture, in which the same node configuration is used in one cluster. Such symmetric scale-out architectures

are likely to run in small islands. ScaleIO was designed to support a mixture of new and old nodes with dissimilar

configurations.

Volume Mapping and Volume Sharing: The volumes that ScaleIO exposes to the application clients can be

mapped to one or more clients running in different hosts. Mapping can be changed dynamically if necessary, and

ScaleIO volumes can be used by applications that expect shared- everything block access and by applications that

expect shared-nothing or shared- nothing-with-failover access.

High Availability

This VSPEX solution provides a highly available virtualized server, network, and storage infrastructure. When the

solution is implemented following the instructions in this white paper, business operations survive with little to no

impact from single-unit failures.

The VMware vSphere High Availability feature enables the virtualization layer to automatically restart virtual

machines in various failure conditions:

● If the virtual machine operating system has an error, the virtual machine can automatically restart on the

same hardware.

● If the physical hardware has an error, the affected virtual machines can automatically restart on other

servers in the cluster.

Note: For virtual machines to restart on different hardware, the servers must have available resources. The

“Compute Layer” section, later in this white paper, provides detailed information on enabling this function.

With vSphere High Availability, you can configure policies to determine which machines automatically restart, and

under what conditions to attempt these operations.


Solution Details

Virtualization

VMware vSphere 5.5 includes advanced features that help maximize performance and overall resource use. The

most important of these features are in memory management. This section describes some of these features, and

what to consider when using these features in the environment.

In general, virtual machines on a single hypervisor consume memory as a pool of resources, as shown in Figure 5

for a system with 64 GB of memory (configuration 1).

Figure 5. Memory Configuration for Virtual Machines on a Single Hypervisor (Configuration 1)


Memory Compression

Memory overcommitment occurs when more memory is allocated to virtual machines than is physically present in a

VMware vSphere host. Using sophisticated techniques, such as ballooning and transparent page sharing, vSphere

5.5 can handle memory overcommitment without any performance degradation. However, if memory usage

exceeds server capacity, vSphere might swap out portions of the memory of a virtual machine.

Nonuniform Memory Access (NUMA)

vSphere 5.5 uses a NUMA load balancer to assign a home node to a virtual machine. Because the home node

allocates virtual machine memory, memory access is local and provides the best performance possible.

Applications that do not directly support NUMA also benefit from this feature.

Transparent Page Sharing

Virtual machines running similar operating systems and applications typically have similar sets of memory content.

Page sharing enables the hypervisor to reclaim any redundant copies of memory pages and keep only one copy,

which frees up the total host memory consumption. If most of your application virtual machines use the same

operating system and application binaries, total memory usage can be reduced to increase consolidation ratios.

Memory Ballooning

By using a balloon driver loaded in the guest operating system, the hypervisor can reclaim host physical memory if

memory resources are under contention, with little or no impact on the performance of the application.

Memory Configuration Guidelines

This section provides guidelines for allocating memory to virtual machines. These guidelines take into account

vSphere memory overhead and the virtual machine memory settings.

vSphere Memory Overhead: Some overhead is expected for the virtualization of memory resources. The memory

space overhead has two components:

● The fixed system overhead for the VMkernel

● Additional overhead for each virtual machine

Memory overhead depends on the number of virtual CPUs and the configured memory for the guest operating

system.

Allocating Memory to Virtual Machines: Many factors determine the proper sizing for virtual machine memory in

VSPEX architectures. With the number of application services and use cases available, determining a suitable

configuration for an environment requires creating a baseline configuration, testing the configuration, and making

adjustments for optimal results.

Compute Layer

VSPEX documents minimum requirements for the number of processor cores and the amount of RAM. The

compute node used for this solution is the Cisco UCS C240 M3 Rack Server with the capability to hold up to 24

internal disks of varying capacity and performance. The tested configuration contains nodes, each with two Intel E-

2650v2 sockets and 96 GB memory. In general, the infrastructure must conform to the following attributes:

● Sufficient cores and memory to support the required number and types of virtual machines

● Sufficient network connections to enable redundant connectivity to the system switches

● Sufficient capacity to enable the environment to withstand a server failure and failover in the environment


ScaleIO components are designed to work with a minimum of three server nodes. The physical server node,

running VMware vSphere, can host other workloads beyond the ScaleIO virtual machine. The implementation that

will be alluded to in this paper contains eight nodes with eight 900-GB 10,000-rpm SAS internal disks in each node.

Note: To enable High Availability for the compute layer, the customer will need one additional spare server to

ensure that the system has enough capability to maintain business operations when a server fails.

Best Practices in the Compute Layer

Use of identical, or at least compatible, servers is preferred, even though ScaleIO can accommodate different

types within a cluster. This is because VSPEX implements hypervisor-level high-availability technologies that may

require similar instruction sets and capabilities from the underlying physical hardware. By implementing ScaleIO on

identical server units, you can minimize compatibility problems in this area.

If high availability is implemented at the hypervisor layer, the largest virtual machine that can be created is

constrained by the smallest physical server in the environment.

Implement high-availability features in the virtualization layer, and ensure that the compute layer has sufficient

resources to accommodate at least single server failures. This helps ensure minimal downtime during upgrades

and maintenance with tolerance for single unit failures.

Within the boundaries of these recommendations and best practices, the compute layer for EMC VSPEX can be

flexible to meet your customer’s specific needs. Ensure that there are sufficient processor cores and RAM per core

to meet the needs of the target environment.

Configuration Guidelines

When designing and ordering the compute/server layer of this VSPEX solution, assuming system workload is well

understood, features such as memory ballooning and transparent page sharing can reduce the aggregate memory

requirement. If the virtual machine pool does not have a high level of peak or concurrent usage, reduce the number

of virtual CPUs. Conversely, if the applications being deployed are highly computational in nature, increase the

number of CPUs and memory purchased.

Intel Xeon Updates

Testing on the Intel Xeon series processors has shown significant increases in virtual machine density from the

server resource perspective. If your server deployment consists of Intel Xeon processors, we recommend

increasing the ratio of virtual CPUs to physical CPUs from 4:1 to 8:1. This essentially halves the number of server

cores required to host the reference virtual machines.

Figure 6 demonstrates the results from tested configurations.


Figure 6. Results of Testing Intel Ivy Bridge microarchitecture (Xeon Processors)

Current VSPEX sizing guidelines require a maximum ratio of virtual CPU cores to physical CPU cores of 4:1, with a

maximum 8:1 ratio for Intel Xeon or later processors. This ratio was based upon an average sampling of CPU

technologies available at the time of testing.

Table 3 lists the hardware resources used at the compute layer by VMware vSphere servers.

Table 3. Hardware Resources Used by VMware vSphere Servers


CPU 1 vCPU per virtual machine

Maximum of 4 vCPUs per physical core

Memory 2 GB RAM per virtual machine

2 GB RAM reservation per VMware vSphere host

Network 2 x 10 Gigabit Ethernet NICs per server

Note: Add at least one additional server to the infrastructure beyond the minimum requirements to implement

VMware vSphere High Availability functionality and to meet the listed minimums.


Note: The solution recommends using a 10 Gigabit Ethernet network or an equivalent 1 Gigabit Ethernet

network infrastructure as long as the underlying requirements for bandwidth and redundancy are fulfilled.

Network Layer

The tested configuration consists of either a pair of Cisco Nexus 5548 switches or redundant Cisco UCS 6248UP

fabric interconnects. When the Cisco UCS 6248UP is used with C240 servers and VIC 1225, the interconnects

provide additional operational benefits, such as firmware, console, and LAN/SAN management from a single tool,

the Cisco UCS Manager. Event monitoring of server hardware, along with the flexibility to grow the ScaleIO cluster

without additional effort and with multipathing, are some of the advantages of building a Cisco UCS domain using

the 6248UP fabric interconnects. The infrastructure must provide the following attributes:

● Redundant network links for the hosts, switches, and storage

● Traffic isolation based on industry-accepted best practices

Support for Link Aggregation

This section provides requirements for a redundant and highly available network. Please refer to the “Deployment”

section, later in this white paper, for details on network setup. The guidelines consider VLANs, the Link

Aggregation Control Protocol (LACP) ESXi Server, and the ScaleIO layer.


Network Infrastructure 2 physical Cisco Nexus 5548UP LAN switches with 2 x 10 Gigabit Ethernet ports per VMWare vSphere server

[or]

2 physical Cisco UCS 6248UP fabric interconnects with redundant converged 10G Fibre Channel over Ethernet (FCoE) connectivity to each server

Logical network traffic isolation between hosts and storage, hosts and clients, and management traffic is available

and provided in both setups.

Figure 7 shows the Cisco Nexus 5548UP setup, with logical VLAN separation.


Figure 7. Cisco Nexus 5548UP Switch Configuration

Figure 8 shows the Cisco UCS 6248UP fabric interconnect setup, with converged 10G FCoE for management and

storage traffic for rack servers below the fabric interconnects.


Figure 8. Cisco UCS 6248UP Fabric Interconnect Configuration

You can use the client access network to communicate with the ScaleIO infrastructure. The storage network

provides communication between the ScaleIO nodes. Administrators use the management network as a dedicated

way to access the management connections on the hosts.

ScaleIO Software

This section provides guidelines for setting up the storage layer of the solution to provide high availability and the

expected level of performance. VMware vSphere 5.5 allows more than one method of storage when hosting virtual

machines. The tested solution uses block protocols, and the ScaleIO layer described in this section uses all current

best practices. VMware vSphere provides host-level storage virtualization, virtualizes the physical storage, and

presents the virtualized storage to the virtual machines.

A virtual machine stores its operating system and all the other files related to the virtual machine activities in a

virtual disk. The virtual disk itself consists of one or more files. VMware uses a virtual SCSI controller to present

virtual disks to a guest operating system running inside the virtual machines.

Virtual disks, as shown in Figure 9, reside on a datastore. Depending on the protocol used, a datastore can be a

VMware VMFS datastore. Another option, Raw Device Mapping (RDM), allows the virtual infrastructure to connect

a physical device directly to a virtual machine. In our ScaleIO solution, we use VMFS datastore or RDM as the

device to provide disk capacity.


Figure 9. Virtual Disk Configuration

VMFS

VMFS is a cluster file system that provides storage virtualization optimized for virtual machines. It may be deployed

over any SCSI-based local or network storage.

Raw Device Mapping (RDM)

VMware also provides RDM, which allows a virtual machine to directly access a volume on the physical storage.

Note: We recommend using RDM mapping in the vSphere environment for this workload. The device is created

on ScaleIO virtual machines that point to the physical disk on the vSphere server.

Redundancy Scheme and Rebuild Process

ScaleIO software uses a mirroring scheme to protect data against disk and node failures (Figure 10). This

architecture supports distributed two-copy schemes. When an SDS node or SDS disk fails, applications can

continue to access ScaleIO volumes, as the data is still available through the remaining mirrors. ScaleIO software

immediately starts a seamless rebuild process that creates another mirror for the data chunks that were lost in the

failure. In the rebuild process, those data chunks are copied to free areas across the SDS cluster, so it is not

necessary to add any capacity to the system.

The surviving SDS cluster nodes carry out the rebuild process by using the aggregated disk and network

bandwidth of the cluster. The process is dramatically faster, resulting in a shorter exposure time and less

application and performance degradation. After the rebuild, all the data is fully mirrored and healthy again. If a

failed node rejoins the cluster before the rebuild process is completed, ScaleIO software dynamically uses data

from the rejoined node to further minimize the exposure time and the use of resources. This capability is important

for overcoming short outages efficiently.


Figure 10. ScaleIO Topology

Elasticity and Rebalancing

Unlike many other systems, a ScaleIO cluster is extremely elastic. Administrators can add and remove capacity

and nodes on the fly during I/O operations.

When a cluster is expanded with new capacity (such as new SDSs or new disks added to existing SDSs), ScaleIO

immediately responds to the event and rebalances the storage by seamlessly migrating data chunks from the

existing SDSs to the new SDSs or disks. Such a migration does not affect the applications, which continue to

access the data stored in the migrating chunks. By the end of the rebalancing process, all ScaleIO volumes are

spread across the SDSs and disks, including the newly added ones, in an optimally balanced manner. Thus,

adding SDSs or disks not only increases the available capacity but also increases the performance of the

applications as they access their volumes.

When an administrator decreases capacity (for example, by removing SDSs or removing disks from SDSs),

ScaleIO performs a seamless migration that rebalances the data across the remaining SDSs and disks in the

cluster.

Note: In all types of rebalancing, ScaleIO migrates the least amount of data possible. ScaleIO has the flexibility to

accept new requests to add or remove capacity while still rebalancing previous capacity additions and removals.


Sizing

This section provides definitions of the reference workload used to size and implement the VSPEX architecture.

Sizing the environment includes designing the nodes that will be used for the ScaleIO environment and specifying

the number of those nodes. This section provides details on how variations in node configuration and the number

of nodes in a cluster affect the number of virtual instances that can be supported per host. The virtual machines

used in this section correspond to the VSPEX definitions of those workloads.

Reference Virtual Machine and Workload

A reference virtual machine captures basic resources needed by a virtual machine with the intent of using it as a

unit of measurement for scaling and sizing purposes. Once defined, the idea is to be able to compare an actual

customer application workload to this reference virtual machine workload and arrive at sizing information for the

platform. In any discussion about virtual infrastructure, the first step is to define this reference workload. It is

important to note, however, that not all servers perform similarly or host the same set of tasks for an accurate

estimation.

VSPEX solutions define a reference virtual machine (RVM) workload, which represents a common point of

comparison. Table 4 describes this workload.

Table 4. VSPEX Reference Workload

Parameter Value

Virtual machine OS Windows Server 2012 R2

Virtual CPUs 1

Virtual CPUs per physical core (maximum) 4

Memory per virtual machine 2 GB

IOPS per virtual machine 25

I/O pattern Fully random skew = 0.5

I/O read percentage 67%

Virtual machine storage capacity 100 GB

Scale Out

ScaleIO is designed to scale from three to many nodes. Unlike most traditional storage systems, as the number of

servers grows, so do capacity, throughputs, and IOPS. The scalability of performance is mostly linear as the cluster

grows. Storage and compute resources grow together, as in the case of rack servers, so that the balance between

them is maintained.

Validated Building Blocks

VSPEX uses a building block approach to reduce complexity. A building block is one specific server node that can

support a certain number of virtual servers in the VSPEX architecture. Each building block combines several local

disk spindles to contribute a shared ScaleIO volume that supports the needs of either a virtualized or private

cloud environment.

Both SDS and SDC are installed on each building block node. The SDS presents local disk to a ScaleIO storage

pool, which then exposes ScaleIO shared block volumes to run the virtual machines. The SDC allows for use of

shared block volumes by local compute resources on the node.


The configuration of a reference building block includes the physical CPU core number, memory size, and disk

spindle number for a server.

Table 5 shows one specific node in a three-node cluster that was validated and provides a flexible solution for

VSPEX sizing.

Table 5. Building Block Node Configuration

Node Parameter Target Value Notes

CPU 6 cores

Memory 64 GB This configuration can support up to 30 virtual machines

Disks 6 x 600 GB 10,000-rpm SAS Capacity, not IOPS, is the limit for the number of virtual machines supported

This configuration contains six SAS disks per node. The validated solution modeled these drives at 600 GB each.

For the workload definition, we were limited by drive capacity more than by drive IOPS. With this configuration, up

to12 virtual machines can be supported by one building block (node).

The node configuration in Table 5 defines the CPU, memory, and disk configuration for one server. However,

ScaleIO is infrastructure agnostic and can run on any server. This VSPEX solution also provides more options for

the building block node configuration. Users can redefine a building block with different configurations. But after the

building block configuration is redefined, the number of virtual machines that the building block can support is also

changed.

To calculate the virtual machine that the new building block can support, we must consider the following

components:

CPU Capability

With a recommended maximum of 4 virtual CPUs for each physical core in a virtual machine environment, a server

node with 16 physical cores can support up to 64 virtual machines.

Memory Capability

When sizing the memory for a server node, the ScaleIO virtual machine and hypervisor must be considered. A

ScaleIO virtual machine consumes 3 GB of RAM and reserves 2 GB RAM for the hypervisor. We do not

recommend using memory overcommit in this environment.

Note: ScaleIO 1.3 introduces a new RAM cache feature by using the SDS server RAM. By default, the RAM

size of the ScaleIO virtual machine is set to 3 GB, and 128 MB of the RAM uses the SDS server RAM cache. Add

the RAM size to the 3 GB of the ScaleIO virtual machine if more RAM cache is used.

Disk Capacity

ScaleIO uses a RAIN (Redundant Array of In-expensive Nodes) topology to ensure data availability. In general, the

capacity available is a function of the capacity per node (formatted capacity) and the number of nodes available

(Table 6).

Assuming N nodes and C TB of capacity per server, the storage available, S, is

This formula accounts for two copies of the data and the ability to survive a single node failure.


Table 6. Theoretical Maximum Number of Virtual Machines per Node (capacity based)

10,000-rpm SAS Drives Number of Virtual Machines

600 12

900 18

1200 24

The primary method for adding IOPS capability to a node without considering cache technologies is to increase

either the number of disk units or the speed of those units.

Determining the Maximum Number of Virtual Machines in a Building Block Node

With the entire configuration defined for the building block node, we calculate the number of virtual machines that

each component can support to find out the number of virtual machines that the building block node can support.

For example, if the customer redefined the building block configuration with 16 physical CPU cores, 64 virtual

machines can be supported (16 cores × 4 virtual machines per core); with 192 GB memory, 93 virtual machines

can be supported (2 GB reserved for the hypervisor and 3 GB for the ScaleIO virtual machine); with 8 SAS drives,

45 virtual machines can be supported, based on the IOPS limit. Therefore, the theoretical maximum, determined by

the lowest denominator (IOPS), gives us a limit for this building block node of 45 virtual machines.

Note: The actual capacity observed for each physical server is further reduced, as the disk capacity limit is lower

than the IOPS limit.

Configuration Guidelines

To choose the appropriate reference architecture for a customer environment, determine the resource

requirements of the environment and then translate these requirements to the appropriate number of reference

virtual machines, as defined earlier in Table 5. This section describes how to use the customer configuration

worksheet to simplify the sizing calculations and additional factors you should take into consideration when

deciding which architecture to deploy.

The customer configuration worksheet helps you assess the customer environment and calculate the sizing

requirements of the environment.

Table 7 shows a completed worksheet for a sample customer environment.


Table 7. Sample Customer Configuration Worksheet

Server Resources Storage Resources

Application CPU (virtual CPUs)

Memory (GB) IOPS Capacity (GB) Reference Virtual Machines

Example application 1: Custom-built application

Resource requirements

1 3 15 30 —

Equivalent reference virtual machines

1 2 1 1 2

Example application 2: Point-of-sale system


4 16 200 200 –


4 8 8 2 8

Example application 3: Web server


2 8 50 25 –


2 4 2 1 4

Total equivalent reference virtual machines 14

To complete the customer configuration worksheet, follow these steps:

1. Identify the application planned for migration into the VSPEX virtualized environment.

2. For each application, determine the compute resource requirements for virtual CPUs, memory (GB), storage

performance (IOPS), and storage capacity.

3. For each resource type, determine the equivalent reference virtual machines required—that is, the number of

reference virtual machines required to meet the specified resource requirements.

4. Determine the total number of reference virtual machines needed from the resource pool for the customer

environment.

Determining the Resource Requirements

Consider the following when you determine resource requirements:

CPU: The reference virtual machine outlined earlier in Table 4 assumes that most virtual machine applications are

optimized for a single CPU. If one application requires a virtual machine with multiple virtual CPUs, modify the

proposed virtual machine count to account for the additional resources.

Memory: Memory plays a key role in ensuring application functionality and performance. Each group of virtual

machines will have different targets for the available memory that is considered acceptable. Like the CPU

calculation, if one application requires additional memory resources, adjust the number of planned virtual machines

to accommodate the additional resource requirements.

For example, if there are 30 virtual machines, but each one needs 4 GB of memory instead of the 2 GB that the

reference virtual machine provides, plan for 60 reference virtual machines.

IOPS: The storage performance requirements for virtual machines are usually the least understood aspect of

performance. The reference virtual machine uses a workload generated by an industry-recognized tool to run a

wide variety of office productivity applications that should be representative of the majority of virtual machine

implementations.


Storage Capacity: The storage capacity requirement for a virtual machine can vary widely, depending on the type

of provisioning, the types of applications in use, and specific customer policies.

Determining the Equivalent Reference Virtual Machines

With all of the resources defined, determine the number of equivalent reference virtual machines by using the

relationships listed in Table 8. Round all values to the closest whole number.

Table 8. Equivalent Reference Virtual Machines

Resource Value for Reference Virtual Machine Relationship Between Requirements and Equivalent Reference Virtual Machines

CPU 1 Equivalent reference virtual machines = Resource requirements

Memory 2 Equivalent reference virtual machines = Resource requirements/2

IOPS 25 Equivalent reference virtual machines = Resource requirements/25

Capacity 100 Equivalent reference virtual machines = Resource requirements/100

For example, application 2 in the customer configuration worksheet in Table 7 requires four CPUs, 16 GB of

memory, 200 IOPS, and 200 GB of storage. This translates to four reference virtual machines for CPU, eight

reference virtual machines for memory, eight reference virtual machines for IOPS, and two reference virtual

machines for capacity. Table 9 shows how that application fits into the worksheet row.

Table 9. Equivalent Reference Virtual Machines for Example Application 2 in Table 7

Application CPU

(virtual CPUs)

Memory (GB)

IOPS Capacity (GB)

Equivalent Reference Virtual Machines

Example application

Resource requirements 4 16 200 200 —

Equivalent reference virtual machines 4 8 8 2 8

Use the highest value in the row to complete the Equivalent Reference Virtual Machines column. The example

requires eight reference virtual machines.

The number of reference virtual machines required for each application type equals the maximum number required

for an individual configuration worksheet example. For example, the number of equivalent reference virtual

machines for the application in Table 9 is eight, as this number will meet all the resource requirements for IOPS,

virtual CPU, and memory resources.

Determining the Total Reference Virtual Machines

After the worksheet is completed for each application that the user wants to migrate into the virtual infrastructure,

compute the total number of reference virtual machines required in the resource pool by calculating the sum of the

total reference virtual machines for all application types. In the example in Table 7, the total is 14 virtual machines.

Deployment

Please refer to Chapter 6, “VSPEX Solution Implementation,” in the EMC VSPEX Private Cloud: VMWare vSphere

and EMC ScaleIO Proven Infrastructure Guide, for deployment details:

http://www.emc.com/collateral/technical-documentation/h13156-vspex-private-cloud-vmware-vsphere-scaleio-

pig.pdf

The following network option from the “Network Implementation” section of the “VSPEX Solution Implementation”

chapter is provided to take advantage of operational efficiencies afforded by Cisco UCS rack servers with the VIC

http://www.emc.com/collateral/technical-documentation/h13156-vspex-private-cloud-vmware-vsphere-scaleio-pig.pdf

http://www.emc.com/collateral/technical-documentation/h13156-vspex-private-cloud-vmware-vsphere-scaleio-pig.pdf


1225 adapter connected to the Cisco UCS 6248UP fabric interconnects. This setup functions as a converged

management and storage traffic network, thus simplifying the topology.

The following link provides a quick setup of the Cisco UCS environment with fabric interconnects:

http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-manager/whitepaper_c11-

697337.html

At the end of this procedure, a seven-node ScaleIO cluster consisting of Cisco UCS C240 rack servers, each with

a VIC 1225, is built. Figure 11 shows a snapshot of service profiles and network settings.

Figure 11. Service Profiles and Network Settings

http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-manager/whitepaper_c11-697337.html

http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-manager/whitepaper_c11-697337.html


Test and Validation

Because of the scale-out multiple-node architecture of ScaleIO, EMC recommends that you consider the possibility

of the loss of a system node. ScaleIO is designed to keep copies of data on multiple nodes to protect against such

a loss. Any node loss affects the virtual machines running on that node, but it should not affect the other users of

the ScaleIO environment.

Post-Installation Checklist

This section provides a list of items to review and tasks to perform after configuring the solution. The goal is to

verify the configuration and functionality of specific aspects of the solution, and ensure that the configuration meets

core availability requirements.

Table 10 lists tasks for testing the installation.

Table 10. Post-Installation Checklist

Task Description Reference

Basic checks Verify that sufficient virtual ports exist on each vSphere host virtual switch.

vSphere Networking

Verify that each vSphere host has access to the required ScaleIO datastores and VLANs.

vSphere Storage Guide

vSphere Networking

Verify that the vMotion interfaces are configured correctly on all vSphere hosts.

vSphere Networking

Deploy and test a single virtual server

Deploy a single virtual machine using the vSphere interface. Verify that the virtual machine is joined to the application domain, can be logged into, and has access to expected networks.

vCenter Server and Host Management vSphere Virtual Machine Management

Verify redundancy of the solution components

Verify the data protection of the ScaleIO system. Restart one ScaleIO node, and ensure that shared volume access is maintained.

Failure Testing

Disable each of the redundant network switches in turn and verify that the vSphere host and virtual machine are intact.

Vendor documentation

On a vSphere host that contains at least one virtual machine, enable maintenance mode and verify that the virtual machine can successfully migrate to an alternate host.

vCenter Server and Host Management

Failure Testing

To provide for system maintenance and hardware failures, a set of virtual machines running the reference workload

were started on two of the three nodes in a ScaleIO environment. There were no virtual machines on the remaining

node. At a predetermined point, the node with no virtual machines running was turned off. Predictably, the I/O

latency of the system was affected due to the loss of one third of the storage resources, but the virtual machines

running on the other nodes were still able to access all of their data. When the node was replaced, rebalancing

occurred automatically in the background without operator intervention and with minimal impact to applications and

users. Similar node and disk failure scenarios involving a seven-node cluster were conducted. Rebuild rate was

monitored while ensuring application availability and cluster resiliency.

Note: Similar tests with virtual machines running on all nodes show the expected result of High Availability

(configured for the non-ScaleIO virtual machines) restarting virtual machines on the surviving nodes until the restart

criteria are no longer valid.

EMC recommends that you include one node more than the workload requires, to help ensure that you can support

the environment during an outage or during system maintenance. The additional spare node should be configured

to be as large as the largest active node in the cluster to accommodate a node failure.


Another seven-node cluster with Cisco UCS 6248 fabric interconnects was built to conduct additional failure testing

for cluster resiliency (Figure 12). This new cluster and the nodes in it were not profiled for virtual machine capacity,

since that exercise and the guidance stemming from it was completed earlier with reference virtual machines. Each

node in this setup consists of a Cisco UCS C240 rack server with VIC 1225 cards and nine internal 900-GB

10,000-rpm SAS disks under ScaleIO management.

Two storage pools were created within the protected domain. A set of virtual machines running the reference

workload was started on nodes in one storage pool. Disk and node failures were introduced while observations of

connectivity, rebuild, and performance were made. The focus of such tests was to check for cluster resiliency while

also not adversely affecting performance. To ascertain logical separation between storage pools, single disk

failures were introduced in both pools concurrently. An MDM node failure was also forced to check on cluster setup

and operation in the absence of the master node. In all these tests, the cluster continued to operate and application

workload was available and operational within expected levels of performance.

ScaleIO has the capability to group a set of nodes within a protected domain as a fault set. Fault sets are supposed

to come into play when data is mirrored. ScaleIO performs mirroring only across fault sets to provide for the

possible loss of a set of nodes with similar firmware or other characteristics, such as during maintenance. This

feature was not tested.

Figure 12. ScaleIO Fault Units

Figure 13 shows a snapshot of the dashboard with relevant metrics.


Figure 13. Dashboard Showing Metrics

Monitoring

This VSPEX proven infrastructure provides an end-to-end solution that requires system monitoring of three discrete

but highly interrelated areas:

● Servers, both virtual machines and clusters

● Networking

● ScaleIO

Given the purview of this white paper, this section focuses primarily on monitoring key components of the ScaleIO

infrastructure. Server resources (processing, memory, and disk) and network usage may be measured through

tools such as esxtop and perfmon. Storage-related metrics may be monitored through vdbench. Key network

metrics to track include aggregate throughput and latencies.

ScaleIO Layer

Monitoring the ScaleIO layer of a VSPEX implementation is crucial to maintaining the overall health and

performance of the system. The ScaleIO GUI enables you to review the overall status of the system, drill down to

the component level, and monitor these components. The various screens display different views and data that are

beneficial to the storage administrator. The ScaleIO GUI provides an easy yet powerful manner with which to gain

insight into how the underlying ScaleIO components are operating. There are several key areas to focus on,

including:

● Dashboard screen


● Protection domain screen

● Protection domain servers screen

● Storage pool screen

The EMC ScaleIO User Guide on EMC Online Support provides more instructions for monitoring the ScaleIO layer.

References

The following documents, available on EMC Online Support, provide additional and relevant information. If you do

not have access to a document, contact your EMC representative.

● EMC VSPEX Private Cloud: VMWare vSphere and EMC ScaleIO Proven Infrastructure Guide

● EMC Host Connectivity Guide for VMware ESX Server

● EMC ScaleIO User Guide

The following documents, located on the VMware website, provide additional and relevant information:

● vSphere Networking

● vSphere Storage Guide

● vSphere Virtual Machine Administration

● vSphere Virtual Machine Management

● vSphere Installation and Setup

● vCenter Server and Host Management

● vSphere Resource Management

● Interpreting esxtop Statistics

● Preparing vCenter Server Databases

● Understanding Memory Resource Management in VMware vSphere 5.0

For documentation on Microsoft products, refer to the following Microsoft resources:

● Microsoft Developer Network

● Microsoft TechNet

Printed in USA C11-733544-00 12/14

Documents

Cisco Solution for EMC VSPEX with ScaleIO Platform · The VSPEX solution for VMware vSphere with EMC ScaleIO validates the configuration for a specified number of virtual machines