Microsoft SQL Server 2019 Big Data Clusters on Cisco UCS Reference Architecture · provides a compelling new way to use SQL Server to bring high-value relational data and high-volume

© 2020 Cisco and/or its affiliates. All rights reserved. Page 1 of 50

Microsoft SQL Server 2019 Big Data Clusters on Cisco

UCS Reference Architecture

White Paper

Cisco Public

© 2020 Cisco and/or its affiliates. All rights reserved. Cisco Confidential Page 2 of 50

Contents

Executive summary 3

Audience and scope 4

Technology overview 4

Solution design 16

Infrastructure sizing guidelines 27

BDC deployment on Cisco UCS 31

Validation 35

Performance validation 35

Monitoring the Kubernetes cluster and hosts with AppDynamics 38

Conclusion 40

Appendix A: Cisco UCS C240 M5 storage options 41

Appendix B: Bills of materials 41

Appendix C: Customized BDC deployment on Cisco UCS 43


Executive summary

Over the past several decades, enterprises have generated enormous amounts of data of many types, and

much of this data has been stored on disparate, isolated storage systems to support ever-changing

business models. The result has been data sprawl and operational challenges, making it difficult for

organizations to get a holistic view of all their data sets so that they can gain deep insights from all the data

stored across the organization.

Microsoft introduced SQL Server 2019 Big Data Clusters (BDC) to address this challenge. BDC provides a

unified, scalable data platform that enables organizations to get a unified view of their various data sets

stored across isolated data management systems without having to replicate or move the data. It enables

organizations to join structured and unstructured data sets to gain deeper insight into their data to help

them make better business decisions.

Prior to SQL Server 2019, a single instance of SQL Server could not be used to large-scale analytics. SQL

Server was not designed or built to be a database engine for analytics on the scale of petabytes or

exabytes. It also was not designed for scale-out computing for data processing or machine learning, nor

for storing and analyzing data in unstructured formats, such as media files. SQL Server 2019 extends its

unified data platform to encompass big data and unstructured data by deploying multiple instances of SQL

Server together with Spark and Hadoop Distributed File System (HDFS) as a BDC. SQL Server 2019 BDC

provides a compelling new way to use SQL Server to bring high-value relational data and high-volume big

data together on a unified, scalable data platform for SQL-centric data processing and machine-learning

use cases.

When selecting the hardware infrastructure for enterprise application deployments such as BDC,

enterprises should choose hardware that will both optimize and simplify deployments and provide

improved management and performance, thereby achieving better return on investment (ROI).

The Cisco Unified Computing System™ (Cisco UCS®) integrates network, computing, storage access, and

virtualization resources into a single cohesive system, providing an optimized infrastructure that enables

organizations to get the most from BDC deployments. Cisco UCS provides an integrated low-latency,

lossless, 25, 40, and 100 Gigabit Ethernet unified network fabric with enterprise-class x86-architecture

servers with a variety of local-storage options, helping optimize BDC deployments and improve

performance. Various Cisco UCS platforms are available, including Cisco HyperFlex™ systems, enabling

deployment of SQL Server 2019 BDC for optimal performance with efficient use of resources.

This document describes a reference architecture for hosting a customized BDC deployment on Cisco

HyperFlex systems and UCS rack servers. The document discusses how to deploy an optimized BDC on

the premises of the data center using an application-centric Cisco UCS environment. It also provides some

deployment guidelines, including guidelines for sizing the system.

Note that this document uses the terms Big Data Clusters and BDC interchangeably to refer to Microsoft

SQL Server 2019 Big Data Clusters.


Audience and scope

This document is written for IT professionals, data scientists, infrastructure administrators, Hadoop

architects, and database specialists who are responsible for planning, designing, and implementing BDC in

their organizations. Readers should have some knowledge about Cisco UCS, Cisco HyperFlex

hyperconverged systems, big data technologies, and Microsoft database technologies.

The document describes a reference architecture for deploying BDC on Cisco UCS using the Cisco

HyperFlex platform and Cisco UCS C240 M5 Rack Servers.

Technology overview

This section provides an overview of the technologies used in the solution architecture described in this

document.

Microsoft SQL Server 2019 Big Data Clusters

This section provides an overview of BDC.

BDC architecture

Microsoft SQL Server 2019 Big Data Clusters provides a new way to use SQL Server to bring high-value

relational data and high-volume big data together on a unified, scalable data platform. BDC allows you to

deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These

components run side by side to enable you to read, write, and process big data from Transact-SQL (T-

SQL) or Spark, allowing you to easily combine and analyze your high-value relational data with high-

volume big data.

Deploying BDC on Kubernetes helps ensure a predictable, fast, and elastically scalable deployment,

regardless of the location of the deployment. BDC can be deployed in any cloud that uses a managed

Kubernetes service, such as Microsoft Azure Kubernetes Service (AKS), or in on-premises Kubernetes

clusters. Built-in management services in BDC provide log analytics, monitoring, backup, and high

availability through an administrator portal, helping ensure a consistent management experience wherever

BDC is deployed.

Figure 1 shows the high-level architecture of Big Data Clusters. On a given infrastructure, whether on-

premises or in the cloud, a Kubernetes cluster is deployed along with the appropriate container network

and storage interface plug-ins required by workloads running on the Kubernetes cluster. On the

Kubernetes cluster, BDC is deployed using the Microsoft azdata tool and can be managed using the

Microsoft Azure Data Studio tool.


Figure 1.

Microsoft SQL Server 2019 Big Data Clusters

In the proposed reference architecture, BDC is deployed in an on-premises Kubernetes cluster that is built

on Cisco UCS and Cisco HyperFlex systems.

The next sections discuss the individual components of BDC and how each BDC pool benefits from Cisco

infrastructure.

BDC storage pool: Scalable shared HDFS storage for big data processing

The SQL Server 2019 relational database engine in a BDC uses an elastically scalable storage layer that

integrates SQL Server and HDFS to scale to petabytes of data storage. The Spark engine that is now part

of SQL Server enables data engineers and data scientists to harness the power of open-source data

preparation and query programming libraries to process and analyze high-volume data in a scalable,

distributed, in-memory computing layer.

Figure 2 shows a single instance of a storage pod with a Hadoop data node, SQL Server, and Spark

services running as containers together. BDC allows organizations to scale such pods, providing scalable

HDFS storage for big data processing. This storage can be used to store unstructured and semi structured

big data potentially ingested from various external data sources. After data is stored in the HDFS storage of

BDC, organizations can analyze and query the data and combine it with other relational data for more

meaning full insight into the data.


Figure 2.

Microsoft SQL Server 2019 Big Data Clusters storage pool pod

Spark is a distributed computing platform for processing large amounts of data stored in data platforms

such as Hadoop. In BDC, the Spark container is co-located with the HDFS data node container, enabling

data locality. When a Spark job is submitted, Spark tries to map the computation tasks (executors) to the

same Hadoop data node on which the data is located. This approach improves performance because less

data needs to be transferred over the network.

In BDC, the storage pool typically includes computing and storage-intensive workloads. Spark jobs, which

typically process large volumes of data in memory, benefit from computation- and memory-dense

platforms. Hadoop data nodes benefit from servers that support huge local-storage capacity. Therefore,

the storage pods typically are deployed on bare-metal servers instead of in virtualized environments.

Cisco UCS C-Series Rack Servers are well suited for BDC storage pools. Cisco UCS C240 M5 Rack

Servers, powered by Intel® Xeon® processors, offer industry-leading performance and are completely

managed by Cisco UCS. These servers support a variety of storage options, offering flexibly to

organizations, allowing organizations to select the right configuration based on storage capacity and I/O

bandwidth requirements for the BDC storage pool. Cisco UCS C240 M5 servers support 25 and 40 Gigabit

Ethernet low-latency, lossless, converged Ethernet networking options: important for Hadoop internal data

replication, data ingestion, and query processing. For more details about C240 M5 storage options, see

Appendix A of this document

For massive Hadoop storage capacity requirements, Cisco UCS offers Cisco UCS S-Series Storage

Servers. A single Cisco UCS S3260 chassis offers massive 840-TB data storage capacity that easily scales

to petabytes. It also supports both 25 and 40 Gigabit Ethernet networking options.


BDC primary pool: Data virtualization with enhanced PolyBase connectors

SQL Server 2019 PolyBase allows organizations to connect and query various structured and unstructured

external data sources, thereby creating a data hub that integrates data from the entire data estate.

PolyBase in SQL Server 2019 has connectors for various data sources such as SQL Server, Azure SQL

Database, Azure SQL Data Warehouse, Azure Cosmos DB, MySQL, PostgreSQL, MongoDB, Oracle,

Teradata, and HDFS. It enables organizations to join multiple data sources with regular T-SQL commands

to get greater insight from multiple data sets at the same time. Figure 3 shows multiple data sources

queried with SQL Server 2019 PolyBase and traditional T-SQL commands.

Figure 3.

Data virtualization with Microsoft SQL Server 2019 PolyBase

The SQL Server primary pool also acts as a gateway to the Big Data Clusters and is where most user and

application connections are handled. Therefore, the availability of SQL primary pool services is critical to

BDC.

In this reference architecture, a Cisco HyperFlex cluster is used to host the SQL primary pool. The Cisco

HyperFlex cluster provides a highly available computing and distributed storage platform for hosting

enterprise-critical applications. Its distributed architecture provides additional data protection to the SQL

primary pool from local intermittent failures such as node and disk failures. In addition to the built-in high-

availability features of SQL Server, the SQL primary pool can take advantages of the high-availability

features offered by the Cisco HyperFlex cluster, thereby increasing the overall availability of the SQL

primary pool. Powered by 2nd Generation Intel Xeon Cascade Lake CPUs, Cisco HyperFlex clusters can

provide the required computing resources to the SQL primary pool.


BDC data pool: Scale-out data mart

Big Data Clusters provide distributed and scale-out computing and storage to improve the performance of

data analysis. When the enhanced PolyBase connectors are combined with BDC data pools, data from

external data sources can be partitioned and cached across all the SQL Server instances in a data pool,

creating a scale-out data mart. A given data pool can contain more than one scale-out data mart, and a

data mart can combine data from multiple external data sources and tables, making it easy to integrate and

cache combined data sets from multiple external sources.

Figure 4 shows a data pool consisting of multiple SQL Server pods, also known as shards. Data can be

ingested from various external resources into the data pool. The cached data stored in the data pool is

distributed across the SQL Server pods, enabling faster data analysis.

Figure 4.

Scale-out data mart using Microsoft SQL Server 2019 Big Data Clusters data pool

In this reference architecture, BDC data pools are deployed on Cisco HyperFlex clusters. The Cisco

HyperFlex system provides the highly available storage and computing resources required by the BDC data

pool. The 25 and 40 Gigabit Ethernet low-latency lossless networking options of the Cisco HyperFlex

cluster coupled with the superior I/O performance of the Cisco HyperFlex system enable faster data

ingestion from various data sources, such as online transaction processing (OLTP) systems and the

Internet of Things (IoT). Additionally, when deployed on a Cisco HyperFlex cluster, data pool pods benefit

from the consistent performance, data protection, and high-availability features of Cisco HyperFlex

systems.

BDC compute pool: Improved PolyBase query performance

The purpose of the compute pool is to provide more computing resources to the Big Data Clusters. The

performance of PolyBase queries can be boosted further by distributing the cross-partition aggregation

and shuffling the filtered query results to compute pools consisting of multiple SQL Server instances that

work together. Compute pool pods are stateless and are not meant for persistent data storage. Some tasks

can be offloaded from the SQL primary instance to the compute pool, thereby reducing the load on the

BDC primary pool.


Cisco HyperFlex clusters offer a unique capability through which organizations can scale computing

resources alone simply by adding computing-only nodes to the existing cluster. Computing-only nodes are

a good choice for deploying BDC compute pools. The Cisco HyperFlex system supports a variety of

servers for computing-only nodes. For large computing requirement, organizations can use a 4-socket

Cisco UCS C480 M5 Rack Server for the BDC compute pool and get maximum benefits from the

underlying dense computing platform.

Control plane

The control plane is responsible for the management and security aspects of BDC. It contains the control

service, the configuration store, and other cluster-level management and monitoring services such as

Kibana, Grafana, and Elastic Search. This pool also includes some critical services such as Hadoop

NameNode, Spark head, and ZooKeeper services. Therefore, control pool availability is critical for BDC

accessibility.

Control plane services are typically light weight and they can be deployed on HyperFlex cluster. The

HyperFlex cluster also provides additional higher availability to these services.

Integrated artificial intelligence and machine learning

Big Data Clusters enable artificial intelligence (AI) and machine-learning tasks on the data stored in HDFS

storage pools and in data pools. You can use Spark as well as built-in AI tools in SQL Server, using R,

Python, Scala, SQL, or Java.

Built-in monitoring and management of BDC

BDC has built-in monitoring capabilities through which all the components of the cluster can be monitored.

Built-in management services in BDC provide log analytics, monitoring, backup, and high availability

through an administrator portal, helping ensure a consistent management experience wherever BDC is

deployed.

Azure Data Studio can be used to manage BDC and allows organizations to perform various tasks, such as

browse HDFS, upload files, and run compatible notebooks. The azdata tool is used to manage BDC end to

end.

Cisco Unified Computing System

Cisco UCS is a next-generation data center platform that unites computing, network, and storage access

resources. The platform, optimized for virtual environments, is designed using open industry-standard

technologies and aims to reduce total cost of ownership (TCO) and increase business agility. The system

integrates a low-latency, lossless 10, 25, or 40 Gigabit Ethernet unified network fabric with enterprise-

class x86-architecture servers. It is an integrated, scalable, multi chassis platform in which all resources

participate in a unified management domain.

The main components of Cisco UCS are as follows:

● Computing: The system is based on an entirely new class of computing system that incorporates

rack-mount and blade servers based on Intel Xeon processors.

● Network: The system is integrated onto a low-latency, lossless, 10-, 25-, or 40-Gbps unified

network fabric, with an option for 100-Gbps uplinks. This network foundation consolidates LANs,

SANs, and high-performance computing networks, which are often separate networks today. The

unified fabric lowers costs by reducing the number of network adapters, switches, and cables, and

by decreasing power and cooling requirements.


● Virtualization: The system unleashes the full potential of virtualization by enhancing the scalability,

performance, and operational control of virtual environments. Cisco® security, policy enforcement,

and diagnostic features are now extended into virtualized environments to better support changing

business and IT requirements.

● Storage access: The system provides consolidated access to both SAN storage and network-

attached storage (NAS) over the unified fabric. By unifying storage access, Cisco UCS can access

storage over Ethernet, Fibre Channel, Fibre Channel over Ethernet (FCoE), and Small Computer

System Interface over IP (iSCSI) protocols. This capability provides organizations with their choice of

storage protocol and physical architecture, along with enhanced investment protection. In addition,

server administrators can pre-assign storage-access policies for system connectivity to storage

resources, simplifying storage connectivity and management for increased productivity.

● Management: The system uniquely integrates all system components, which enables the entire

solution to be managed as a single entity by Cisco UCS Manager. Cisco UCS Manager has an

intuitive GUI, a command-line interface (CLI), and a robust API to manage all system configuration

and operations. Cisco UCS can also be managed by Cisco Intersight™ software, a cloud-based

management and monitoring platform that offers a single-pane portal for multiple Cisco UCS

deployments across multiple locations.

Cisco UCS is designed to deliver these benefits:

● Reduced TCO and increased business agility

● Increased IT staff productivity through just-in-time provisioning and mobility support

● A cohesive, integrated system that unifies the technology in the data center; the system is managed,

serviced, and tested as a whole

● Scalability through a design for hundreds of discrete servers and thousands of virtual machines and

the capability to scale I/O bandwidth to match demand

● Industry standards supported by a partner ecosystem of industry leaders

Cisco UCS fabric interconnects

The Cisco UCS fabric interconnect is a core part of Cisco UCS, providing both network connectivity and

management capabilities for the system. According to the model chosen, the Cisco UCS fabric

interconnect offers line-rate, low-latency, lossless Ethernet, FCoE, and Fibre Channel connectivity. Cisco

UCS fabric interconnects provide the management and communication backbone for the Cisco UCS C-

Series Rack Servers, Cisco UCS S-Series Storage Servers, Cisco HyperFlex HX-Series nodes, Cisco UCS

B-Series Blade Servers, and Cisco UCS 5100 Blade Server Chassis. All servers and chassis, and therefore

all blades, attached to the Cisco UCS fabric interconnects become part of a single, highly available

management domain. In addition, by supporting unified fabrics, Cisco UCS fabric interconnects provide

both LAN and SAN connectivity for all servers within its domain. They support Cisco low-latency, lossless

Ethernet unified network fabric capabilities, thereby increasing the reliability, efficiency, and scalability of

Ethernet networks. The fabric interconnects support multiple traffic classes over the Ethernet fabric from

the servers to the uplinks. Organizations gain significant TCO savings from a FCoE-optimized server design

in which network interface cards (NICs), host bus adapters (HBAs), cables, and switches can be

consolidated.


Cisco UCS 6332-16UP Fabric Interconnect

The Cisco UCS 6332-16UP Fabric Interconnect (Figure 5) is a 1-rack-unit (1RU) 10 and 40 Gigabit

Ethernet, FCoE, and native Fibre Channel switch offering up to 2430 Gbps of throughput. The switch has

24 x 40-Gbps fixed Ethernet and FCoE ports, plus 16 x 1/10-Gbps fixed Ethernet and FCoE ports or

4/8/16-Gbps Fibre Channel ports. Up to 18 of the 40-Gbps ports can be reconfigured as 4 x 10-Gbps

breakout ports, providing up to 88 total 10-Gbps ports, although Cisco HyperFlex nodes must use a 40

Gigabit Ethernet virtual interface card (VIC) adapter to connect to a Cisco UCS 6300 Series Fabric

Interconnect.

Figure 5.

Cisco UCS 6332-16UP Fabric Interconnect

Cisco UCS 6454 Fabric Interconnect

The Cisco UCS 6454 Fabric Interconnect (Figure 6) is a 1RU 10, 25, 40, and 100 Gigabit Ethernet, FCoE,

and Fibre Channel switch offering up to 3.82 Tbps of throughput and up to 54 ports. The switch has 28 x

10/25-Gbps Ethernet ports, 4 x 1/10/25-Gbps Ethernet ports, 6 x 40/100-Gbps Ethernet uplink ports, and

16 unified ports that can support 10/25-Gbps Ethernet or 8/16/32-Gbps Fibre Channel. All Ethernet ports

are capable of supporting FCoE. Cisco HyperFlex nodes can connect at 10- or 25-Gbps speeds,

depending on the model of Cisco VIC in the nodes and the Small Form-Factor Pluggable (SFP) optics or

cables chosen.

Figure 6.

Cisco UCS 6454 Fabric Interconnect

Cisco UCS C240 M5 Rack Server

The Cisco UCS C240 M5 Rack Server (Figure 7) is a 2-socket, 2RU server offering industry-leading

performance and expandability. It supports a wide range of storage and I/O-intensive infrastructure

workloads, from big data and analytics to collaboration. It incorporates the Intel Xeon Scalable processors,

supporting 28 cores per CPU and a total of 56 cores per device. It supports different disk backplane

options, a feature that is especially important for I/O-intensive workloads such as databases and Hadoop

clusters. It supports up to 26 hot-swappable small-form-factor (SFF) 2.5-inch drives, including 2 rear hot-

swappable SFF drives (up to 10 slots support Non-Volatile Memory Express [NVMe] PCIe solid-state disks

[SSDs] on the NVMe-optimized chassis version), or 12 large-form-factor (LFF) 3.5-inch drives plus 2 rear

hot-swappable SFF drives. Refer to Appendix B of this document for more details about the backplane

options for the server.


Figure 7.

Cisco UCS C240 M5 Rack Server

For more information about the Cisco UCS C240 M5 server, refer to

https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-

servers/datasheet-c78-739279.html.

Cisco HyperFlex systems

Cisco HyperFlex systems are based on the Cisco UCS platform, combining Cisco HyperFlex HX-Series x86

servers and integrated networking technologies through Cisco UCS fabric interconnects into a single

management domain. Cisco HyperFlex systems also include industry-leading virtualization hypervisor

software from VMware and next-generation software-defined storage technology. These combined

technologies create a complete virtualization platform, providing network connectivity for guest virtual

machine connections and distributed storage to house the virtual machines spread across all the Cisco

UCS x86 servers, instead of using specialized storage or networking components. The unique storage

features of the Cisco HyperFlex log-based file system enable rapid cloning of virtual machines, snapshots

without the traditional performance penalties, and inline data deduplication and compression. All

configuration, deployment, management, and monitoring of the solution can be performed with existing

tools for Cisco UCS and VMware vSphere, such as Cisco UCS Manager and VMware vCenter, and new

integrated HTML-based management tools, such as Cisco HyperFlex Connect and Cisco Intersight

software.

This powerful linking of advanced technology stacks into a single, simple, rapidly deployed solution makes

Cisco HyperFlex systems a true second-generation hyperconverged platform. Cisco HyperFlex systems

are compatible with and support any version or distribution of Kubernetes that can be virtualized and run in

a supported virtual machine guest operating system.

Cisco HyperFlex HX-Series nodes

A standard Cisco HyperFlex cluster requires a minimum of three Cisco HyperFlex HX-Series converged

nodes: that is, nodes with shared disk storage. Data is replicated across at least two of these nodes, and a

third node is required for continuous operation in the event of a single-node failure. Each node that has

disk storage is equipped with at least one high-performance SSD for data caching and rapid

acknowledgment of write requests. Each node also is equipped with additional disks, up to the platform’s

physical limit, for long-term storage and capacity.

Cisco HyperFlex HX240c M5SX All Flash Node

The capacity-optimized Cisco HyperFlex HX240c M5SX All Flash Node (Figure 8) contains a 240-GB M.2

SSD that acts as the boot drive; a 240-GB housekeeping SSD; a single 375-GB Intel® Optane™ NVMe

drive, 1.6-TB NVMe drive, or 1.6-TB SAS SSD write-log drive installed in a rear hot-swappable slot; and

six to twenty-three 960-GB or 3.8-TB SATA SSDs for storage capacity. These servers are powered by

Intel Xeon Scalable CPUs and provide the computing resources required by the workloads. Optionally, the

Cisco HyperFlex Acceleration Engine card can be added to improve write performance and compression.

For configurations requiring self-encrypting drives (SEDs), the caching SSD is replaced with an 800-GB

SAS SED SSD, and the capacity disks are also replaced with 960-GB or 3.8 TB SED SSDs.

https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/datasheet-c78-739279.html

https://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-c-series-rack-servers/datasheet-c78-739279.html


Figure 8.

Cisco HyperFlex HX240c M5SX All Flash Node

For more information about Cisco HyperFlex systems, refer to

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-

series/solution-overview-c22-736815.pdf.

Cisco HyperFlex HX220c M5N All NVMe Node

This small-footprint Cisco HyperFlex HX220c M5N All NVMe Node (Figure 9) contains a 240-GB M.2 SSD

that acts as the boot drive, a 1-TB housekeeping NVMe drive, a single 375-GB Intel Optane NVMe write-

log drive, and six to eight 1- or 4-TB NVMe drives for storage capacity. These servers are powered by

Intel Xeon Scalable CPUs and provide the computing resources required by the workloads. Optionally, the

Cisco HyperFlex Acceleration Engine card can be added to improve write performance and compression.

Self-encrypting drives are not available as an option for the all-NVMe nodes.

Figure 9.

Cisco HyperFlex HX220c M5SN All NVMe Node

For more information about the Cisco HyperFlex HX220C M5SN All NVMe Node, refer to

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-

series/hxaf220c-m5-specsheet-nvme.pdf.

Cisco UCS infrastructure management and automation

Several options are available for managing Cisco UCS.

Cisco Intersight cloud-based management

The Cisco Intersight cloud-based management tool is designed to provide centralized off-site

management, monitoring and reporting for all Cisco UCS solutions. It can be used to deploy and manage

Cisco HyperFlex clusters. The Cisco Intersight platform offers direct links to Cisco UCS Manager and Cisco

HyperFlex Connect for the systems it is managing and monitoring. The Cisco Intersight website and

framework is being constantly upgraded and extended with new and enhanced features independent of

the products that are managed, meaning that many new features and capabilities are provided with no

downtime or upgrades required by end users. This unique combination of embedded and online

technologies results in a complete cloud-based management solution that can care for Cisco HyperFlex

systems throughout the entire lifecycle, from deployment through retirement. For more information, see

https://intersight.com.

Figure 10 shows how a Cisco HyperFlex cluster is managed from the Cisco Intersight platform. As shown,

Cisco HyperFlex Connect can be launched from the Cisco Intersight platform. The platform also provides

an option to upgrade the Cisco HyperFlex cluster.

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-series/solution-overview-c22-736815.pdf

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-series/solution-overview-c22-736815.pdf

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-series/hxaf220c-m5-specsheet-nvme.pdf

https://www.cisco.com/c/dam/en/us/products/collateral/hyperconverged-infrastructure/hyperflex-hx-series/hxaf220c-m5-specsheet-nvme.pdf

https://intersight.com/


Figure 10.

Managing Cisco HyperFlex clusters from the Cisco Intersight platform

Figure 11 shows information about individual Cisco HyperFlex nodes provided through the Cisco Intersight

platform.

Figure 11.

Cisco HyperFlex node details


Figure 12 shows individual Cisco HyperFlex nodes inventory details provided through the Cisco Intersight

platform.

Figure 12.

Cisco HyperFlex node inventory details

Cisco UCS Director

Cisco UCS Director is a heterogeneous platform for private cloud infrastructure as a service (IaaS). It

supports a variety of hypervisors along with Cisco and third-party servers, network and storage resources,

and converged and hyperconverged infrastructure across bare-metal and virtualized environments. Cisco

is creating a path forward to allow customers to transition to the Cisco Intersight platform. Cisco UCS

Director can be managed by the Cisco Intersight platform to make updates easier and improve support.

AppDynamics for application monitoring

AppDynamics is an application performance monitoring (APM) solution that provides real-time visibility and

insight into IT environments, enabling quick identification and resolution of common IT issues. It enables

organizations to take the right action exactly at the right time with automated anomaly detection, rapid

root-cause analysis, and a unified view of the entire application ecosystem, including on-premises and

public cloud deployments. In addition to the core APM, it offers infrastructure monitoring, Kubernetes

cluster monitoring, SQL Server Database monitoring, and end-user monitoring and allows organizations to

create various dashboards for future analyses. For more information about AppDynamics and to get

started, refer to https://docs.appdynamics.com/display/PRO45/Getting+Started.

If customers are standardized on AppDynamics for application and infrastructure monitoring, they can use

their existing AppDynamics controller to monitor Big Data Clusters as well.

https://docs.appdynamics.com/display/PRO45/Getting+Started


For BDC, customers can use AppDynamics to monitor the underlying Kubernetes cluster using a cluster

agent that collects metrics and metadata for the entire cluster, for every node, and for every name space

down to the container level. When the applications deployed on BDC are instrumented with AppDynamics

APM agents, the cluster agent allows the organization to view both Kubernetes and APM metrics for those

pods, provided that both the cluster agent and the APM agents are reporting the data to the same account

of the AppDynamics controller.

Upstream Kubernetes services

Big Data Clusters are deployed as a series of interrelated containers that are managed in Kubernetes.

Kubernetes is an open-source container-orchestration system that provides a platform for automating the

deployment, scaling, and operations of application containers across clusters of hosts. Deploying SQL

Server 2019 Big Data Clusters on Kubernetes helps ensure a predictable, fast, and elastically scalable

deployment, regardless of where BDC is deployed. BDC can be deployed in any cloud that includes a

managed Kubernetes service, such as AKS; in on-premises Kubernetes clusters, such as AKS on Azure

Stack; or on server infrastructure from any original equipment manufacturer (OEM).

Solution design

This section describes the solution design for deploying Microsoft SQL Server 2019 Big Data Clusters on

Cisco UCS.

Segregation of BDC pools

In this reference architecture, Big Data Clusters are deployed using Cisco HyperFlex clusters and Cisco

UCS C240 M5 bare-metal servers. The HDFS storage pool pods are deployed on Cisco UCS C240 M5

servers, and the remaining pools, including the SQL primary, compute, data, and control pools, are

deployed on virtual machines running on the Cisco HyperFlex hyperconverged platform.

Typically, Hadoop deployments are implemented on bare-metal servers and use local disks of the servers.

Hadoop is a distributed storage and computing platform with resiliency built into it. Providing an additional

layer of resiliency on top of these built-in resiliency features would result in performance degradation and

waste resources. Hence, in this reference architecture, the Hadoop storage pool is deployed on Cisco UCS

C240 M5 bare-metal servers and uses local disks of the servers for Hadoop storage. Other BDC pools,

including the SQL primary, compute, data, and control pools, are deployed on Cisco HyperFlex clusters. In

this reference architecture, a VMware-based Cisco HyperFlex cluster is used. High availability of the

Kubernetes virtual machines is achieved using VMware live migration services. The Cisco HyperFlex

system also provides the Container Storage Interface (CSI) plug-in, which enables dynamic storage

provisioning for the BDC pods deployed on the cluster. This architecture simplifies and standardizes the

deployment of all physical components of BDC pool pods.

Physical topology

This section describes the architectural components of the BDC solution tested and validated on Cisco

UCS.

Figure 13 shows a sample BDC deployment built in the lab. It consists of a Cisco HyperFlex All-Flash

cluster and Cisco UCS C240 M5 Rack Servers. Note that a Cisco HyperFlex all-NVMe cluster consisting of

Cisco HyperFlex HX22c M5N All NVMe Nodes can be used in place of the All-Flash cluster.


Figure 13.

Microsoft SQL Server 2019 Big Data Clusters reference architecture

Also note that in the reference architecture shown in the figure, more Cisco HyperFlex nodes and Cisco

UCS C240 M5 servers can be added to the existing fabric interconnects until all the fabric interconnect

ports are filled. For additional nodes beyond the first Cisco UCS domain, a new Cisco UCS domain with a

new pair of fabric interconnects can be created and added to the cluster to achieve horizontal scalability.

Table 1 provides lists the specific hardware and software components used in the reference architecture

for validating BDC in the Cisco lab.

Table 1. Hardware and software components used in the reference architecture

Hardware components Software versions

Cisco UCS Manager and top-of-rack switches

● Fabric interconnects and Cisco UCS Manager version

● Cisco Nexus® upstream switches

● 2 x Cisco UCS 6332-16UP Fabric Interconnects with Cisco UCS Release 4.0(4g) on a 40-Gbps network

● Cisco Nexus 9000 Series Switches Release 6.1(2) on a 40-Gbps network

Cisco HyperFlex All-Flash system

● 4 x Cisco HyperFlex HXC240c M5SX All Flash Nodes, with each node configured with 2 Intel Xeon Gold 6240 CPUs (18 cores per CPU), 768 GB of memory, 1 x 1.6-TB write cache NVMe drive, and 10 x 960-GB capacity SSDs

● VMware ESXi

● Cisco HyperFlex CSI plug-in

● Cisco HyperFlex HX Data Platform Release 4.0.2a

● VMware vSphere Release 6.7U3

● Release 4.0.438


Hardware components Software versions

Cisco UCS C240 M5 servers for storage pool

● 4 x Cisco UCS C240 M5 servers, with each node configured with 2 x Intel Xeon Gold 6230 CPUs (20 cores per CPU), 768 GB of memory, and 7 x 1.9-GB SSDs

● Local-storage provisioner image

● Firmware Release 4.0(4G)C

● Release 2.1.0

Kubernetes and BDC cluster details

● Operating system

● Upstream Kubernetes instance

● azdata tool

● BDC (SQL primary instance)

● Red Hat Enterprise Linux (RHEL) Release 7.6

● Release 1.15.3

● Release 15.0.4003

● SQL Server 2019 (RTM-CU1)-15.0.4003.23 (X64)

Two Cisco Nexus 9000 Series Switches are used as upstream network switches connected to the

customer network. These switches are configured in a virtual port channel (vPC) to provide high availability

for the devices connected to these switches.

The Cisco HyperFlex HX240c M5 All Flash Nodes are used to create a Cisco HyperFlex All-Flash cluster,

which provides distributed storage and computing resources to the workloads running on the system. In

this solution, BDC control, compute, data, and primary pool pods are hosted on the Cisco HyperFlex

cluster. BDC storage pool pods are hosted on Cisco UCS C240 M5 bare-metal nodes. Cisco HyperFlex

nodes and Cisco UCS C240 M5 nodes are connected to a pair of Cisco UCS fabric interconnects. The

fabric interconnects are connected to a pair of upstream Cisco Nexus 9000 Series Switches. Infrastructure

services such as Microsoft Active Directory, Domain Name System (DNS), Network Time Protocol (NTP),

and VMware vCenter typically are installed outside the Cisco HyperFlex cluster. These services are used to

manage and monitor the stack.

Infrastructure scaling options

This section provides details about the infrastructure scalability options available to support scalable SQL

Server 2019 Big Data Clusters deployed on Cisco HyperFlex and Cisco UCS servers.

The sizing guidelines are discussed later in the sections that follow.

Cisco HyperFlex cluster scalability

Depending on the I/O operations per second (IOPS) and bandwidth requirements of BDC, choose to use

either a Cisco HyperFlex All-Flash or all-NVMe cluster. Both cluster types deliver the consistent IOPS and

bandwidth at low latency required by the various BDC pools.

A 4-node Cisco HyperFlex All-Flash or All-NVMe cluster is a good starting point for deploying the SQL

primary, data, and compute pools for Big Data Clusters. A Cisco HyperFlex All-Flash or ALL NVMe clusters

can be scaled up to 32 converged nodes. Cisco HyperFlex computing-only nodes can be added to the

cluster in a 1:1 ratio with the converged nodes. A Cisco HyperFlex cluster can be scaled to a maximum of

64 nodes (32 converged plus 32 computing-only nodes). All these nodes will be connected the fabric

interconnects with 40- or 25-Gbps network cables.


Cisco UCS C240 M5 scalability

For production deployments, a Hadoop cluster with a replication factor of 3 is recommended. For other

environments, a replication factor of 2 or 3 can be used depending on criticality and availability

requirements. To start, a minimum of three Cisco UCS C240 M5 Hadoop nodes (configured with SSDs) are

recommended for hosting the storage pool.

Depending on the storage capacity and bandwidth requirements of the storage pool, more Cisco UCS

C240 M5 nodes can be added to the Kubernetes cluster. The storage pool can easily be extended simply

by adding C240 M5 nodes to the existing Cisco UCS fabric interconnects up to the limit of the available

ports in the fabric interconnect. After that, a new Cisco UCS domain can be created with a new pair of

fabric interconnects, and this can be added to the cluster to extend it.

Logical topology

Figure 14 shows the overall logical design of SQL Server 2019 Big Data Clusters deployed on Cisco

HyperFlex and Cisco UCS servers.

Figure 14.

Microsoft SQL Server 2019 Big Data Clusters on Cisco UCS: Logical topology

Cisco HyperFlex All-Flash or All-NVMe nodes and Cisco UCS C240 M5 Rack Servers are managed using a

pair of fabric interconnects. A VMware-based Cisco HyperFlex cluster is deployed using Cisco HyperFlex

HX-Series nodes. On top of the Cisco HyperFlex cluster, multiple virtual machines are deployed to host the

various BDC pools. After the virtual machines have been created, the Red Hat Enterprise Linux (RHEL)

operating system is installed on all the virtual machines and on the Cisco UCS C240 M5 servers. The RHEL

operating system is configured for Kubernetes, and various tools, such as kubectl, kubelet, and kubeadm,

are installed to host the Kubernetes cluster across the virtual machines and Cisco UCS C240 M5 servers.

Docker Engine is installed on the virtual machines and Cisco UCS C240 M5 servers to provide runtime

operations for the containers.


One of the virtual machines is chosen as the Kubernetes primary node, and the control plane is initialized

using the kubeadm init command on the selected virtual machine. This command initializes the control

plane and issues the kubeadm join command, which needs to be run on all the remaining virtual machines

and Cisco UCS C240 M5 nodes to join them to the control plane as worker (or minion) nodes. This process

creates the Kubernetes cluster across the virtual machines and Cisco UCS C240 M5 servers.

For instructions for initializing the control-plane node, installing the pod network plug-in, and joining the

nodes as workers, see https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-

cluster-kubeadm/.

After the Kubernetes cluster is installed, Container Storage Interface (CSI) plug-ins need to be installed for

the Cisco HyperFlex cluster and Cisco UCS C240 M5 servers for dynamic storage provisioning.

Cisco HyperFlex container storage interface plug-in

The Cisco HyperFlex CSI plug-in enables the Cisco HyperFlex cluster to dynamically provision the

persistent storage for the BDC pods running on the Cisco HyperFlex cluster. With the CSI plug-in, the

Cisco HyperFlex cluster provides shared storage that can be accessed by a BDC pod regardless of the

ESXi host on which it is running. This capability enables portability of a Kubernetes host (virtual machines)

from one physical host to another, increasing the overall availability of Kubernetes hosts and thereby

reducing downtime for BDC pods.

Cisco HyperFlex storage provisioner pods are deployed on each Kubernetes host (virtual machine) running

on the Cisco HyperFlex cluster, as shown in Figure 15. A default storage class is created that allows

Kubernetes pods to consume storage through the Cisco HyperFlex CSI plug-in.

For more information about Cisco HyperFlex CSI component and architecture, refer to

https://www.cisco.com/c/en/us/td/docs/hyperconverged_systems/HyperFlex_HX_DataPlatformSoftware/

HyperFlex_Kubernetes_Administration_Guide/4_0/b_Cisco_HyperFlex_Systems_Administration_Guide_for

_Kubernetes_4_0/b_Cisco_HyperFlex_Systems_Administration_Guide_for_Kubernetes_4_0_chapter_010

0.html.

Figure 15.

Cisco HyperFlex CSI plug-in deployment

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

https://www.cisco.com/c/en/us/td/docs/hyperconverged_systems/HyperFlex_HX_DataPlatformSoftware/HyperFlex_Kubernetes_Administration_Guide/4_0/b_Cisco_HyperFlex_Systems_Administration_Guide_for_Kubernetes_4_0/b_Cisco_HyperFlex_Systems_Administration_Guide_for_Kubernetes_4_0_chapter_0100.html





Local-storage provisioning for Cisco UCS C240 M5

As discussed previously, Cisco UCS C240 M5 bare-metal servers are dedicated for use for Hadoop

storage pool pods. Because Hadoop is a distributed storage platform with in-built resiliency, there is no

need to provide an additional layer of data resiliency for Hadoop data. Therefore, in this reference

architecture, local disks (SSDs and NVMe drives) of the servers are used directly in a RAID 0 configuration

for Hadoop storage. This approach achieves maximum storage capacity from the local drives and better

performance due to data striping.

On each Cisco UCS C240 M5 server, logical volumes are created using local disks and are mounted in the

/mnt/local-storage/ directory. Logical volumes are created using Linux logical volume management (LVM).

As shown in Figure 16, physical volumes corresponding to SSDs map together into a volume group and

then into two logical volumes. The logical volumes are striped across the disks, resulting in superior

performance.

Figure 16.

Cisco UCS C240 M5 local-storage configuration

In this reference architecture, persistent storage provisioning on Cisco UCS C240 M5 servers is achieved

using the local-storage CSI plug-in. The local-storage provisioner uses logical volumes created on each

node as explained earlier and facilitates storage provisioning for BDC storage pool pods. The local-storage

CSI plug-in is deployed as a pod of type daemonset on each Cisco UCS C240 M5 worker node and is

responsible for managing logical volumes of that node only. When the local-storage provisioner is

deployed, a storage class is also created. Figure 17 shows one local-storage provisioner pod deployed on

each Cisco UCS C240 M5 node and a storage class for storage provisioning created on Cisco UCS 240

M5 servers in the cluster.

For more information about the local-storage provisioner, refer to https://github.com/microsoft/sql-server-

samples/tree/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu.

https://github.com/microsoft/sql-server-samples/tree/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu

https://github.com/microsoft/sql-server-samples/tree/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu


Figure 17.

Local-storage provisioner pods

Configuring BDC pod placement on specific Kubernetes hosts

As discussed previously, BDC storage pool pods are restricted to run only on Cisco UCS C240 M5 Rack

Servers, and the remaining pool pods are restricted to run only on virtual machines hosted on a Cisco

HyperFlex cluster.

Some BDC pools (such as data, SQL primary, and compute pools) are I/O and computation intensive.

These computation- and I/O-intensive pools should be deployed on dedicated Kubernetes hosts. For

example, two or more Kubernetes hosts (virtual machines or C240 M5 servers) can be dedicated for a

specific pool. This approach helps ensure that no pods of different BDC pools run in the same Kubernetes

host so that pods do not contend for host resources. It also enables better management and

troubleshooting.

The following sections provides some guidelines about how to restrict BDC pools to run on specific

Kubernetes hosts using Kubernetes labels and selector primitives.

Placement of BDC pools on specific Kubernetes hosts is controlled using two predefined node labels:

mssql-cluster and mssql-resource. For a heterogeneous Kubernetes cluster, the mssql-cluster label can

control which nodes are used for BDC pod deployment. Kubernetes nodes that do not have this label will

not be considered for BDC pod deployment. The mssql-resource label controls pool-level placement.

Kubernetes assigns pools to specific nodes that match the specified labels in the control.json and bdc.json

files. For examples of BDC control.json and bdc.json control files, refer to Appendix C of this document.

The following command shows the syntax for labeling nodes to assign designated workers to the specific

pools:

kubectl label nodes <worker nodes names> mssql-cluster=<label value> mssql-resource=<label-value>

Storage pool pods

Storage pool pods are typically I/O and computation intensive and so are restricted to run only on Cisco

UCS C240 M5 Rack Servers. Run the following kubectl command on C240 M5 nodes so that the storage

pool pods are deployed only on the C240 M5 Kubernetes hosts:

kubectl label nodes bdc-w12-storage-1 bdc-w13-storage-2 bdc-w14-storage-3 bdc-w15-storage-4

mssql-cluster=bdc mssql-resource=bdc-storagepool

Note that in the reference architecture, one storage pod is deployed on one Cisco UCS C2450 M5 node

using all the local drives through the local-storage provisioner.


Other pool pods: Data, compute, control, and primary

Other pool pods are restricted to run only on virtual machines deployed on a Cisco HyperFlex All-Flash or

All-NVMe cluster. Run the kubectl command shown here on virtual machines nodes on which SQL

primary, control, data, and compute pool pods need to be deployed.

The following command restricts the primary pool pods to be deployed on three worker nodes:

kubectl label nodes bdc-w1-master-1 bdc-w2-master-2 bdc-w3-master-3 mssql-cluster=bdc mssql-

resource=bdc-masterpool

The following command restricts the control pool pods to be deployed on three worker nodes:

kubectl label nodes bdc-w4-control-1 bdc-w5-control-2 bdc-w6-control-3 mssql-cluster=bdc

mssql-resource=bdc-controlpool

The following command restricts the compute pool pods to be deployed on two worker nodes:

kubectl label nodes bdc-w7-compute-1 bdc-w8-compute-2 mssql-cluster=bdc mssql-resource=bdc-

computepool

The following command restricts the data pool pods to be deployed on three worker nodes:

kubectl label nodes bdc-w9-data-1 bdc-w10-data-2 bdc-w11-data-3 mssql-cluster=bdc mssql-

resource=bdc-datapool

Table 2 provides more details about the mapping of Big Data Clusters pools to the virtual machines and

rack servers used in this reference architecture.

Table 2. Microsoft SQL Server 2019 Server Big Data Clusters pod placement

BDC pool Number of pods

Number of virtual machines and bare-metal servers

Kubernetes host names

Node labels

SQL primary 3 3 virtual machines bdc-w1-master-1

bdc-w2-master-2

bdc-w3-master-3

mssql-cluster=bdc

mssql-resource=bdc-masterpool

Control 3 3 virtual machines bdc-w4-control-1

bdc-w5-control-2

bdc-w6-control-3

mssql-cluster=bdc

mssql-resource=bdc-controlpool

Compute 2 2 virtual machines bdc-w7-compute-1

bdc-w8-compute-2

mssql-cluster=bdc

mssql-resource=bdc-computepool

Data 3 3 virtual machines bdc-w9-data-1

bdc-w10-data-2

bdc-w11-data-3

mssql-cluster=bdc

mssql-resource=bdc-datapool

Storage 4 4 bare-metal servers bdc-w12-storage-1

bdc-w13-storage-2

bdc-w14-storage-3

bdc-w15-storage-4

mssql-cluster=bdc

mssql-resource=bdc-storagepool

Note: In addition to the virtual machines listed in Table 2, one Kubernetes host runs on a dedicated virtual

machine that acts a Kubernetes control plane (Kubernetes primary node).


Note that as of with latest SQL Server 2019 BDC release, Release 15.0.4013.40, scaling of BDC pods is

not supported. Hence, you must understand the requirements and size the BDC cluster during the initial

phase only.

VMware vSphere virtual machine anti-affinity rules

An anti-affinity rule for a vSphere cluster places a group of virtual machines across multiple different hosts,

which prevents all virtual machines from failing at the same time if a single host fails. You should make sure

that virtual machines running pods of same pool are deployed on different ESXi hosts. This approach

prevents pods of the same pool from going down at the same time in the event of a single ESXi host

failure.

For instance, using vSphere anti-affinity rules, deploy three virtual machines running SQL primary pods on

three different ESXi hosts so that failure of a single ESXi host does not result in the restart or failure of

multiple SQL primary instances at the same time. This recommendation can be applied to all BDC pool

(control, primary, compute, and data) pods running on a Cisco HyperFlex cluster.

Figure 18.

VMware anti-affinity rules

High availability for critical services

For increased reliability, BDC allows you to deploy the SQL Server primary instance, HDFS NameNodes,

and Spark shared services in a highly available configuration using their corresponding built-in features.

AlwaysOn Availability Groups for SQL primary instances

SQL Server primary instances running in the primary pool can be deployed in a highly available

configuration using the native AlwaysOn Availability Groups (AVG) feature. As shown Figure 19, when BDC

is deployed with the primary pool in a high-availability configuration, it deploys three pods, each running a

SQL Server primary instance on three different virtual machines. The primary replica is responsible for


servicing read-write requests from users, and the two secondary replicas can serve read-only requests. If

the primary replica fails, one of the secondary replicas takes over the primary role and serves read-write

requests. When the previous primary replica pod recovers from the failure, it assumes the secondary role

and continues to get updates from the current primary replica pod.

Figure 19.

Microsoft SQL Server AlwaysOn high-availability feature for primary pool

Highly available big data components: NameNodes, Spark services, and Zookeeper

BDC allows you to deploy HDFS NameNodes and shared Spark services in a highly available configuration.

Zookeeper services are also deployed in a highly available configuration. Deploying multiple replicas for

these services enhances scalability, reliability, and load balancing of the workloads among the available

replicas.

For more information about deploying SQL primary instances, HDFS NameNodes, and Spark services in a

highly available configuration, refer to https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-

high-availability?view=sql-server-ver15.

Logical network design

This section discusses the logical topology of the architecture described in this document.

Cisco HyperFlex node logical network

Figure 20 shows the logical network topology of a Cisco HyperFlex node. The Cisco HyperFlex system has

a predefined virtual network design at the ESXi hypervisor level. Four virtual switches are created, and

each uses two uplinks, which are each serviced by a virtual NIC (vNIC) defined in a Cisco UCS service

profile. The switch, vSwitch-hx-VM-network, is used by virtual machines for the Kubernetes host

management network. It has two uplinks active on both fabrics A and B. Two active data paths will result in

aggregated bandwidth.

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-high-availability?view=sql-server-ver15

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-high-availability?view=sql-server-ver15


Figure 20.

Cisco HyperFlex node logical network

Figure 21 shows the logical network topology of BDC deployed on a Kubernetes cluster that is spread

across the virtual machines and Cisco UCS C240 M5 bare-metal rack servers.

All the Kubernetes hosts deployed on virtual machines (the figure shows only one Cisco HyperFlex node)

and Cisco UCS C240 M5 bare-metal servers are connected to the network (shown with a thick blue line)

with a dedicated VLAN for Kubernetes cluster communication. The dotted blue line indicates the internal

pod network that is created and managed by CNI plug-ins. It uses the Kubernetes host network for pod-

to-pod communication deployed across the Kubernetes hosts. The Kubernetes hosts running inside the

virtual machines will be connected to the network (thick blue line) using the vSwitch-HX-VM-network

switch. For all other management purposes, all the Cisco HyperFlex and ESXi nodes and Cisco UCS C240

M5 bare-metal servers are connected to a separate network, which is represented with a black line.


Figure 21.

Logical network topology of a Kubernetes cluster

Infrastructure sizing guidelines

This section provides few guidelines for sizing the infrastructure for deploying Big Data Clusters on Cisco

HyperFlex clusters and Cisco UCS C240 M5 servers.

Cisco HyperFlex cluster sizing guidelines for primary, compute, and data pools

SQL Server instances running within SQL primary, data, and compute pool pods are the main components

for resource consumption. You can use the Cisco HyperFlex sizer to size the Cisco HyperFlex cluster

configuration for SQL Server workloads.

SQL Server instances running within the SQL primary, data, and compute pools are relational database

engines and can be modeled using typical relational data warehouse workloads.

For Cisco HyperFlex cluster sizing, customers can start with two Cisco HyperFlex nodes worth of resources

for the data pool, one Cisco HyperFlex node worth of resources for the primary pool, and one Cisco

HyperFlex node worth of resources for the compute pool. The control pool does not need many resources,

so it can co-exist with any other pool on the same node. Note that as you add nodes to the Cisco

HyperFlex cluster, all the pools, and thereby the workload, will be distributed across all the nodes.

Use the Cisco HyperFlex sizer with SQL Server as the workload type to determine the Cisco HyperFlex

cluster configuration. As shown in Figure 22, use the custom OLAP database profile and enter the SQL

Server database capacity requirements along with CPU and memory details.

Note: To access the Cisco HyperFlex sizer, you must have a Cisco.com or partner login.

https://hyperflexsizer.cloudapps.cisco.com/


Figure 22.

Cisco HyperFlex sizer for BDC pools: Specifying the requirements

After you have submitted the requirements, the Cisco HyperFlex sizer tool will process the request and

provide a Cisco HyperFlex cluster configuration, as shown in Figure 23.


Figure 23.

Cisco HyperFlex sizer for BDC pools: Configuration details

Cisco UCS C240 M5 sizing guidelines for storage pool

According to the amount of data processed and resources required to process the data, you can classify

the workloads of a typical Spark cluster into Light, Balanced, and Memory Optimized profiles, as shown in

Table 3. When a Spark job is submitted to the cluster, the job is divided into smaller sets of tasks. Each

task is assigned to an executor for processing. Each executor in the Spark cluster will be assigned certain

resources to run the task. The resources required for the executor depend on the type of workload profile

and the data set size. Table 3 lists the cores and memory resources required by each executor for a given

profile. The resources required for the Spark driver depend on the size of the data set. For small data sizes

(less than 5 TB), one CPU and 2 GB of memory will be sufficient for the Spark driver. For medium data set

sizes (10 TB), the Spark driver needs two CPUs and 8 GB of memory. For large data set sizes (100 TB or

more), the Spark driver needs four CPUs and about 40 GB of memory for good performance.

Table 3. Spark profiles

Profile type Description Cores per executor

Memory per executor (GiB)

Light This profile is used to run small jobs that use relatively few CPU and memory

resources to process the smaller data sets.

2 4

Balanced This profile is used to process large volumes of data that require more CPU and

memory resources.

4 8

Memory

Optimized

This profile is used to deliver fast performance for workloads that process large

data sets in memory.

4 16


Note: All the parameters used in this sizing exercise are used only to demonstrate representative sizing

calculations and do not reflect actual workload characterization. You must be sure to perform any sizing

based on actual workloads as determined through appropriate testing.

Each of the three profiles consists of a set of queries that can be classified into small, medium, and large

query types. For instance, a Balanced profile may have a mixture of 10 small queries, 3 medium-sized

queries, and 1 large query. Each query type is specified to process a certain amount of data. Table 4

shows typical data set sizes processed by each type of query.

Table 4. Spark query types

Profile Type Number of concurrent active sessions Targeted in-memory data set size (GB)

Small 10 5

Medium 5 50

Large 1 100

You can use Tables 3 and 4 to help calculate the number of executors required to process a given Spark

workload profile.

To demonstrate the sizing guidelines for this reference architecture, replication factor of 3 and a

compression ratio of 3 are considered. For Hadoop temporary space requirements, 20 percent of Hadoop

raw data capacity is considered.

Table 5 provides storage requirements for 100 Tebibytes (TiB) of Hadoop raw data storage capacity with a

replication factor of 3 and a compression ratio of 3.

Table 5. HDFS raw data storage calculation

Attribute Value

Raw data capacity required (TiB) 100

Hadoop replication factor 3

Total raw capacity (TiB) 300

Compression ratio 3

Total compressed capacity 100

Temporary space (%) 20

Temporary space required (TiB) 20

Total capacity required (TiB) 120


Table 6 shows the individual server node configuration used for the sizing.

Table 6. Cisco UCS C240 M5 server configuration for storage pool

Component Details

Server Cisco UCS C240 M5

CPU 2 x Intel Xeon Scalable 6320 processors (20 cores per

processor)

Memory 12 x 32-GB DDR4 (384 GB)

Boot M.2 with 2 x 240-GB SSDs

Storage 12 x 1.6-TB Enterprise Value SATA SSDs

VIC 40 Gigabit Ethernet (Cisco UCS VIC 1387)

Storage controller Cisco 12-Gbps SAS modular RAID controller with 4-GB flash-

based write cache (FBWC) or Cisco 12-Gbps modular SAS host

bus adapter (HBA)

Using the server configuration shown in Table 6, you can start with a minimum of 8 Cisco UCS C240 M5

servers, each configured with 40 cores, a minimum of 384 GB memory, and 12 x 1.6-TB SATA SSDs, for a

total Hadoop raw data capacity of 100 TiB.

You can scale by 8 x Cisco UCS C240 M5 servers for every 100 TiB of Hadoop raw data capacity required.

For instance, for a Hadoop raw data requirement of 200 TiB, 16 servers are required, for 300 TiB of raw

data capacity, 24 servers are required, and so on.

BDC deployment on Cisco UCS

Before beginning your BDC installation, verify that the Kubernetes cluster is deployed across the virtual

machines and Cisco UCS C240 M5 servers as discussed in the preceding sections. Also verify that the

appropriate network plug-ins are deployed, and that the Cisco HyperFlex CSI and local-storage

provisioners are deployed on the Kubernetes cluster.

To deploy BDC, the tools listed in Table 7 need to be installed on a client machine.

Table 7. Tools required for Microsoft SQL Server 2019 Big Data Clusters deployment

Tool Description

azdata Command-line tool for installing and managing a BDC

Azure Data Studio Cross-platform graphical tool for querying SQL Server

Data Virtualization extension Extension for Azure Data Studio that provides a data virtualization wizard

Python An interpreted, object-oriented, high-level programming language with dynamic

semantics; many parts of Big Data Clusters for SQL Server use Python

kubectl Command-line tool for monitoring the underlying Kubernetes cluster


For more information about how to install the tools listed in Table 7, refer to

https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15.

Verify that the client machine can contact the Kubernetes cluster by copying the Kubernetes configuration

file to the client machine and setting the path to the file location. Verify that kubectl points to the correct

cluster context (Figure 24).

Figure 24.

Client machine configuration

The next step is to use the azdata tool to deploy BDC. The azdata tool has a few built-in profiles that

deploy BDC with default configurations. However, the azdata tool also allows you to customize your

deployment to accommodate the workloads you are planning to run. Note that you cannot change the

scale (number of replicas) or storage settings for BDC services after deployment, so you must plan your

deployment configuration carefully to avoid capacity issues.

You can customize BDC by using built-in profiles that are available in the azdata tool. Figure 25 shows how

to list the built-in profiles. Use the kubeadm-prod profile to deploy BDC for a production use case.

Figure 25.

Listing and selecting BDC built in profiles

https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15


Two JavaScript Object Notation (JSON) files are created under the custom folder to control the BDC

deployment. These two files need to be updated to match your requirements. Appendix C of this document

shows the contents of these two files customized for the current reference architecture. For more

information about how to update these two files to match your deployment requirements, refer to

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-custom-configuration?view=sql-

server-ver15.

After the bdc.json and control.json files have been updated to meet your requirements, the BDC cluster

can be deployed using the azdata tool as shown in Figure 26.

Figure 26.

Microsoft SQL Server 2019 Big Data Clusters deployment

After BDC has been successfully deployed, you can check all the services, as shown in Figure 27.

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-custom-configuration?view=sql-server-ver15

https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-custom-configuration?view=sql-server-ver15


Figure 27.

Microsoft SQL Server 2019 Big Data Clusters services

Note that the deployment time depends on the speed of your Internet access as the deployment

downloads the BDC images from the Microsoft public registry. To speed up the deployment, you can pre

download the BDC images to your local repositories and update the control.json file accordingly.

Figure 28 shows all the BDC pods after successful deployment.

Figure 28.

Microsoft SQL Server 2019 Big Data Clusters pods


Validation

Several tests were performed to validate the robustness of the reference architecture in typical component

failure scenarios.

In the event of any unplanned downtime for a Cisco HyperFlex node, the Kubernetes host (virtual machine)

will be moved from the failed node and restarted on a different node by the VMware High Availability

feature. This behavior increases the overall uptime of the Kubernetes host and thereby the uptime of the

BDC pods.

Failure of a Cisco UCS C240 M5 server triggers rescheduling of the storage pool pod on a different Cisco

UCS C240 M5 server. However, because persistent volumes of the failed pods will not be available on the

new node, Kubernetes cannot bring up the pods on the new node. However, because of internal Hadoop

replication, this behavior does not affect the overall availability of the storage pool.

Network failures such as link failures from nodes to fabric interconnects and from fabric interconnects to

upstream Cisco Nexus switches do not affect the availability of BDC because of the use of redundant

network links.

Performance validation

A big data use case was used to validate the solution. In this scenario, BDC was deployed on a four-node

Cisco HyperFlex All-Flash cluster and four Cisco UCS C240 M5 Rack Servers, using the Databricks TPC-

DS toolkit. The workload in this toolkit is derived from TPC-DS; its results are not comparable to published

TPC-DS results because the Databricks TPC-DS toolkit does not comply with the TPC-DS specifications.

The validation testing presented here used a 10-TB data set (with a scale factor of 10000) consisting of

row-structured and semi structured data. The Databricks TPC-DS toolkit schema describes a data model

of a retail enterprise selling through three channels (stores, catalogs, and web). The Databricks TPC-DS

toolkit in the test runs a read-only suite of 99 queries, some of which are split into queries A and B,

resulting in 104 Spark SQL queries.

It is important to verify that all the system resources, such as CPU, memory, and the individual disks on

each storage node, are balanced in terms of utilization while processing terabytes or petabytes of data.

Resource utilization on each node while the Databricks TPC-DS toolkit queries ran serially on the 10-TB

data set is described here. The 10-TB testing was performed with 68 Spark executors, with 17 executors

running per node. Each executor was configured with a total of 36 GB of memory (32 GB of executor

memory plus 4 GB of memory overhead) and 4 CPUs. The Spark driver is configured with 6 CPUs and 8 GB

of memory. Each Cisco UCS C240 M5 node is configured to use up to 700 GB of memory, leaving the

remaining memory for use by other processes.

Figure 29 shows the peak disk utilization on the individual SSDs on of the C240 M5 bare-metal storage

nodes during the processing of query 64. Approximately 350 MBps (250 MBps write plus 100 MBps read)

of aggregated bandwidth is delivered to each SSD, with less than 2 milliseconds (ms) of latency. An

aggregated bandwidth of 2.5 GBps is delivered by each node. A similar trend occurs on the remaining

three nodes. As the workload increases, greater aggregated bandwidth is expected. More disks can be

added on the individual C240 M5 nodes to scale up the bandwidth requirements. For a scale-out scenario,

more C240 M5 nodes can be added to the cluster.


Figure 29.

Drive performance on one Cisco UCS C240 M5 storage node

Figure 30 shows the peak CPU utilization on four C240 M5 storage nodes during query 14 processing. As

shown in the figure, all the nodes are utilized equally. The screen image was captured using the Grafana

dashboard, which is integrated into BDC.


Figure 30.

CPU utilization on each Cisco UCS C240 M5 storage node

Figure 31 shows the memory consumption of one of the C240 M5 nodes during the entire test.

Figure 31.

Memory utilization of one Cisco UCS C240 M5 storage node throughout test


Monitoring the Kubernetes cluster and hosts with AppDynamics

AppDynamics is an end-to-end application performance monitoring, or APM, solution. It provides deeper

insight into all the application layers, end-user activity, and infrastructure across a system. It has a strong

monitoring capability for the underlying Kubernetes cluster, where it can collect performance metrics for all

the pods running on the cluster. AppDynamics can collect Kubernetes node performance metrics as well.

In this reference architecture, a Kubernetes cluster hosting SQL Server 2019 Big Data Clusters was

monitored using AppDynamics. To monitor the underlying Kubernetes cluster, the AppDynamics cluster

agent was deployed on the cluster. The AppDynamics cluster agent is installed as a pod in the Kubernetes

cluster and is responsible for monitoring and collecting the various performance metrics of all the pods

running across the clusters.

For information about installing and configuring the cluster agent on Kubernetes, refer to

https://docs.appdynamics.com/display/PRO45/Configure+the+Cluster+Agent#ConfiguretheClusterAgent-

ConfigureProxySupport.

Figure 32 shows a sample Yaml file used to deploy the cluster agent as a pod in its own name space:

appdynamics. Do not forget to update the proxy details when your cluster is behind the proxy.

Figure 32.

AppDynamics cluster agent deployment

After the cluster agent is installed, the Kubernetes cluster for BDC will be automatically discovered in the

AppDynamics portal, as shown in Figure 33.

Figure 33.

AppDynamics controller

https://docs.appdynamics.com/display/PRO45/Configure+the+Cluster+Agent#ConfiguretheClusterAgent-ConfigureProxySupport

https://docs.appdynamics.com/display/PRO45/Configure+the+Cluster+Agent#ConfiguretheClusterAgent-ConfigureProxySupport


Figure 34 shows a pod-level inventory, and Figure 35 shows some of the metrics collected by the

AppDynamics cluster agent.

Figure 34.

Microsoft SQL Server 2019 Big Data Clusters pod inventory

Figure 35.

Microsoft SQL Server 2019 Big Data Clusters storage pod metrics


The AppDynamics controller allows you use provided dashboards as well as to create dashboards and

reports about the collected metrics for later analysis of cluster behavior. Using AppDynamics, you can

baseline any metrics and use health rules to create alerts.

Conclusion

SQL Server 2019 Big Data Clusters enables organizations to bring high-value relational data and high-

volume big data together in a unified, scalable data platform. Enterprises can use the power of PolyBase to

virtualize their data stores, create data lakes, and create scalable data marts in a unified, secure

environment without needing to implement slow, costly extract, transform, and load (ETL) pipelines. It

makes data-driven applications and analysis more responsive and productive.

Cisco UCS integrates network, computing, storage access, and virtualization resources into single

cohesive system, providing an optimized infrastructure that enables organizations to get the most from

their Big Data Clusters deployments. An integrated low-latency, lossless, 25, 40, and 100 Gigabit Ethernet

unified network fabric with enterprise-class, x86-architecture servers and a variety of local-storage

options helps optimize BDC deployments and improve performance. The Cisco Intersight platform and

Cisco UCS Director facilitate end-to-end deployment, management, and monitoring of Cisco HyperFlex

clusters.


Appendix A: Cisco UCS C240 M5 storage options

Table 8 lists the storage options available with the Cisco UCS C240 M5 Rack Server.

Table 8. Cisco UCS C240 M5 Rack Server storage options

Cisco UCS C240 M5 models Number of drive slots

UCSC-C240-M5SX 24 SFF front + 2 SFF rear SAS/SATA SSDs or HDDs

UCSC-C240-M5SN 10 SFF NVMe (8 front and 2 rear) + 16 front SAS/SATA SSDs or HDDs

UCSC-C240-M5S 8 SFF front + 2 SFF rear SAS/SATA SSDs or HDDs

UCSC-C240-M5L 12 LFF front + 2 SFF rear SAS/SATA SSDs or HDDs

Appendix B: Bills of materials

This section provides bills of materials (BoMs) for a four-node Cisco HyperFlex All-Flash / ALL-NVMe

cluster and four Cisco UCS C240 M5 servers for hosting SQL Server 2019 Big Data Clusters.

Table 9 provides the BoM for a four-node Cisco HyperFlex All-Flash cluster consisting of Cisco HyperFlex

HX240c M5SX All Flash Nodes.

Table 9. BoM for a four-node Cisco HyperFlex All-Flash cluster

Part number Description Qty

HXAF240C-M5SX Cisco HyperFlex HX240c M5 All Flash Node 4

HX-MR-X32G2RT-H 32GB DDR4-2933-MHz RDIMM/2Rx4/1.2v 48

HX-SAS-M5HD Cisco 12G Modular SAS HBA for up to 26 drives 4

HX-PCI-1-C240M5 Riser 1 incl 3 PCIe slots (x8, x16, x8) 4

HX-PCI-2B-240M5 Riser 2B incl 3PCIeslots(x8,x16,x8)+2NVMe(1cnctr) supports GPU 4

HX-SD960G61X-EV 960GB 2.5 inch Enterprise Value 6G SATA SSD 40

HX-NVMEHW-H1600 1.6TB 2.5in U.2 HGST SN200 NVMe High Perf. High Endurance 4

HX-SD240GM1X-EV 240GB 2.5 inch Enterprise Value 6G SATA SSD 4

HX-M2-240GB 240GB SATA M.2 4

HX-MLOM-C40Q-03 Cisco VIC 1387 Dual Port 40Gb QSFP CNA MLOM 4

HX-MSD-32G 32GB Micro SD Card for UCS M5 servers 4

HX-PSU1-1600W Cisco UCS 1600W AC Power Supply for Rack Server 8

CAB-C13-CBN Cabinet Jumper Power Cord, 250 VAC 10A, C14-C13 Connectors 8

HX-RAILB-M4 Ball Bearing Rail Kit for C220 M4 and C240 M4 rack servers 4

UCS-MSTOR-M2 Mini Storage carrier for M.2 SATA/NVME (holds up to 2) 4

UCSC-HS-C240M5 Heat sink for UCS C240 M5 rack servers 150W CPUs & below 8

UCSC-RNVME-240M5 C240 M5 Rear NVMe CBL(1) kit, Rear NVMe CBL, backplane 4

HXAF240C-BZL-M5SX HXAF240C M5 Security Bezel 4



UCSC-BBLKD-S2 UCS C-Series M5 SFF drive blanking panel 56

HX-CPU-I6230 Intel 6230 2.1GHz/125W 20C/22 MB 3DX DDR4 2933 MHz 8

HX-VSP-6-7-FND-D Factory Installed -vSphere SW 6.7 End-user to provide License 4

HX-VSP-6-7-FND-DL Factory Installed - VMware vSphere 6.7 SW Download 4

CON-SNT-AF240CSX SNTC 8X5XNBD Cisco HyperFlex HX240c M5 All Flash Node 4

Table 10 provides the BoM for a four-node Cisco HyperFlex All-NVMe cluster consisting of Cisco

HyperFlex HX220c M5N All NVMe Nodes.

Table 10. BoM for a four-node Cisco HyperFlex all-NVMe cluster


HXAF220C-M5SN Cisco HXAF220c M5 All NVMe Hyperflex System 4

CON-SNT-AF20M5SN SNTC-8X5XNBD Cisco HXAF220c M5 All NVMe Hyperflex System 4

HX-MR-X32G2RT-H 32GB DDR4-2933-MHz RDIMM/2Rx4/1.2v 48

HX-NVME2H-I4000 Cisco 2.5" U.2 4.0TB Intel P4510 NVMe High Perf. Value Endu 24

HX-NVMEXPB-I375 375GB 2.5in Intel Optane NVMe Extreme Performance SSD 4

HX-NVME2H-I1000 Cisco 2.5" U.2 1,0 TB Intel P4510 NVMe High Perf. Value Endu 4

HX-M2-240GB 240GB SATA M.2 4

HX-MLOM-C40Q-03 Cisco VIC 1387 Dual Port 40Gb QSFP CNA MLOM 4

HX-PSU1-1600W Cisco UCS 1600W AC Power Supply for Rack Server 8


HX-MSD-32G 32GB Micro SD Card for UCS M5 servers 4

HX-RAILF-M4 Friction Rail Kit for C220 M4 rack servers 4

HXAF220C-BZL-M5SN HXAF220C M5 All NVMe Security Bezel 4




HX-CPU-I6230 Intel 6230 2.1GHz/125W 20C/22 MB 3DX DDR4 2933 MHz 8

HX-VSP-6-7-STD-D Factory Installed - VMware vSphere 6.7 Std SW and Lic (2CPU) 4

HX-VSP-6-7-STD-DL Factory Installed - vSphere 6.7 Standard SW Download 4

Table 11 provides the BoM for a four-node Cisco UCS C240 M5 bare-metal server used for hosting the

BDC storage pool.


Table 11. BoM for a four-node Cisco UCS C240 M5 bare-metal server


UCSC-C240-M5SX UCS C240 M5 24 SFF + 2 rear drives w/o CPU,mem,HD,PCIe,PS 4

UCS-MR-X32G2RT-H 32GB DDR4-2933-MHz RDIMM/2Rx4/1.2v 48

UCSC-PCI-1B-240M5 Riser 1B incl 3 PCIe slots (x8, x8, x8); all slots from CPU1 4

UCSC-PCI-2C-240M5 Riser 2C incl 3 PCIe slots (3 x8) supports front+rear NVMe 4

UCSC-MLOM-C40Q-03 Cisco VIC 1387 Dual Port 40Gb QSFP CNA MLOM 4

UCS-M2-240GB 240GB SATA M.2 8

UCSC-PSU1-1600W Cisco UCS 1600W AC Power Supply for Rack Server 8


UCSC-RAILB-M4 Ball Bearing Rail Kit for C220 & C240 M4 & M5 rack servers 4

CIMC-LATEST IMC SW (Recommended) latest release for C-Series Servers. 4

UCS-SID-INFR-BD Big Data and Analytics Platform (Hadoop/IoT/ITOA/AI/ML) 4

UCS-SID-WKL-MSFT Microsoft 4



CBL-SC-MR12GM5P Super Cap cable for UCSC-RAID-M5HD 4


UCSC-SCAP-M5 Super Cap for UCSC-RAID-M5, UCSC-MRAID1GB-KIT 4

UCS-CPU-I6230 Intel 6230 2.1GHz/125W 20C/27.50MB DCP DDR4 2933 MHz 8

UCSC-RAID-M5HD Cisco 12G Modular RAID controller with 4GB cache 4

UCS-SD16TM1X-EV 1.6TB 2.5 inch Enterprise Value 6G SATA SSD 48

CON-SNT-C240M5SX SNTC 8X5XNBD UCS C240 M5 24 SFF + 2 rear drives w/o CPU, mem 4

CON-ISV1-EL2S2V3A ISV 24X7 RHEL Server 2Socket-OR-2Virtual; ANNUAL List Price 4

For the SQL Server 2019 Big Data Clusters licensing guide, refer to https://www.microsoft.com/en-us/sql-

server/sql-server-2019-pricing.

Appendix C: Customized BDC deployment on Cisco UCS

The contents of sample control.json and bdc.json files are presented here for use in customized BDC

deployment on Cisco UCS.

cat control.json

{

"apiVersion": "v1",

"metadata": {

"kind": "Cluster",

"name": "bdc-clus"

https://www.microsoft.com/en-us/sql-server/sql-server-2019-pricing

https://www.microsoft.com/en-us/sql-server/sql-server-2019-pricing


},

"spec": {

"docker": {

"registry": "mcr.microsoft.com",

"repository": "mssql/bdc",

"imageTag": "2019-CU1-ubuntu-16.04",

"imagePullPolicy": "Always"

},

"storage": {

"data": {

"className": "csi-hxcsi-default",

"accessMode": "ReadWriteOnce",

"size": "150Gi"

},

"logs": {


"accessMode": "ReadWriteOnce",

"size": "100Gi"

}

},

"endpoints": [

{

"name": "Controller",

"serviceType": "NodePort",

"port": 30080

},

{

"name": "ServiceProxy",


"port": 30777

}

],

"nodeLabel": "bdc-controlpool",

"clusterLabel": "bdc",

"settings": {

"ElasticSearch": {

"vm.max_map_count": "-1"

}

}

}

}


cat bdc.json

{

"apiVersion": "v1",

"metadata": {

"kind": "BigDataCluster",

"name": "bdc-clus"

},

"spec": {

"resources": {

"nmnode-0": {

"spec": {

"replicas": 2,

"nodeLabel": "bdc-controlpool"

}

},

"sparkhead": {

"spec": {

"replicas": 2,


}

},

"zookeeper": {

"spec": {

"replicas": 3,


}

},

"gateway": {

"spec": {

"replicas": 1,

"endpoints": [

{

"name": "Knox",


"port": 30443

}

],


}

},

"appproxy": {

"spec": {


"replicas": 1,

"endpoints": [

{

"name": "AppServiceProxy",


"port": 30778

}

],


}

},

"master": {

"metadata": {

"kind": "Pool",

"name": "default"

},

"spec": {

"type": "Master",

"replicas": 3,

"storage": {

"data": {

"size": "150Gi",


"accessMode": "ReadWriteOnce"

},

"logs": {

"size": "100Gi",



}

},

"nodeLabel": "bdc-master",

"endpoints": [

{

"name": "Master",


"port": 31433

},

{

"name": "MasterSecondary",


"port": 31436


}

],

"settings": {

"sql": {

"hadr.enabled": "true"

}

}

}

},

"compute-0": {

"metadata": {

"kind": "Pool",

"name": "default"

},

"spec": {

"type": "Compute",

"replicas": 2,

"storage": {

"data": {

"size": "250Gi",



},

"logs": {

"size": "170Gi",



}

},

"nodeLabel": "bdc-computepool"

}

},

"data-0": {

"metadata": {

"kind": "Pool",

"name": "default"

},

"spec": {

"type": "Data",

"replicas": 3,

"storage": {

"data": {


"size": "600Gi",



},

"logs": {

"size": "200Gi",



}

},

"nodeLabel": "bdc-datapool"

}

},

"storage-0": {

"metadata": {

"kind": "Pool",

"name": "default"

},

"spec": {

"type": "Storage",

"replicas": 4,

"storage": {

"data": {

"size": "9728Gi",

"className": "local-storage",


},

"logs": {

"size": "150Gi",

"className": "local-storage",


}

},

"nodeLabel": "bdc-storagepool",

"settings": {

"spark": {

"includeSpark": "true"

}

}

}

}

},


"services": {

"sql": {

"resources": [

"master",

"compute-0",

"data-0",

"storage-0"

]

},

"hdfs": {

"resources": [

"nmnode-0",

"zookeeper",

"storage-0",

"sparkhead"

],

"settings": {

"hdfs-site.dfs.replication": "3"

}

},

"spark": {

"resources": [

"sparkhead",

"storage-0"

],

"settings": {

"spark-defaults-conf.spark.driver.memory": "8g",

"spark-defaults-conf.spark.driver.cores": "8",

"spark-defaults-conf.spark.executor.instances": "36",

"spark-defaults-conf.spark.executor.memory": "32768m",

"spark-defaults-conf.spark.executor.cores": "4",

"yarn-site.yarn.nodemanager.resource.memory-mb": "716800",

"yarn-site.yarn.nodemanager.resource.cpu-vcores": "70",

"yarn-site.yarn.scheduler.maximum-allocation-mb": "737280",

"yarn-site.yarn.scheduler.maximum-allocation-vcores": "6",

"yarn-site.yarn.scheduler.capacity.maximum-am-resource-percent": "0.3"

}

}

}

}

}


Printed in USA 220104.4 05/20

Documents

Microsoft SQL Server 2019 Big Data Clusters on Cisco UCS Reference Architecture · provides a compelling new way to use SQL Server to bring high-value relational data and high-volume