EsgynDB Enterprise 2.0 Platform Reference Architecture · PDF fileThe architecture involves one or more clients concurrently using SQL queries to access the data ... its runtime execution

06Dec2015 © 2015 Esgyn Corporation 1

EsgynDB Enterprise 2.0 Platform Reference Architecture

This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion™ (Incubating) implementation with licensed support and extensions. It outlines server configurations for various vendors and attempts to describe some of the considerations for sizing an EsgynDB Application

1 INTRODUCTION ................................................................................................................................... 1

2 ARCHITECTURE ................................................................................................................................... 1

3 CAPACITY PLANNING ......................................................................................................................... 5

3.1 PROCESSING USAGE ......................................................................................................................... 5 3.2 MEMORY USAGE ............................................................................................................................... 6 3.3 DISK USAGE ..................................................................................................................................... 6 3.4 NETWORK USAGE ............................................................................................................................. 7

4 REFERENCE ARCHITECTURE GUIDANCE FOR PRODUCTION BARE METAL CLUSTER .......... 7

4.1 MEDIUM/LARGE DEPLOYMENT ........................................................................................................... 8 4.2 SMALL DEPLOYMENT ......................................................................................................................... 9

5 CLOUD DEPLOYMENT ...................................................................................................................... 10

6 CONCLUSION ..................................................................................................................................... 10

1 Introduction

The Apache Trafodion (Incubating) project provides a full transactional SQL database integrated into the

Apache Hadoop™ ecosystem to support operational workloads. EsgynDB Enterprise 2.0, built on Apache

Trafodion, offers a fully-supported, enterprise-ready version with extensions for additional features,

including cross datacenter support in EsgynDB Enterprise Advanced 2.0.

The reference architecture description describes a purpose-built EsgynDB Enterprise installation.

Specifically it describes the architecture and provisioning for a cluster whose purpose is running one or

more EsgynDB application workload(s).

This reference architecture does not describe a configuration where EsgynDB Enterprise is part of a

wider Hadoop cluster running other ecosystem applications such as MapReduce. Clusters running mixed

workloads can start from sizing/provisioning information here. But the final sizing/provisioning must also

incorporate requirements from the other workloads in the cluster. As such it is beyond the scope of this

document.

2 Architecture

Apache Trafodion provides an enterprise-class, web scale database engine in the Hadoop ecosystem. In

addition, Trafodion enables SQL query language and transactional semantics for native Apache HBase™

and Apache Hive™ tables. Trafodion provides transactional support for data stored in HBase. It

supports fully distributed ACID transactions across multiple statements, tables, and rows, which enables


EsgynDB Enterprise to support operational workloads that are generally beyond most Hadoop ecosystem

components.

EsgynDB Enterprise Release 2.0 extends Apache Trafodion by providing additional features such as

cross datacenter support, using the architecture depicted below

The architecture involves one or more clients concurrently using SQL queries to access the data

managed by EsgynDB via a driver (ODBC/JDBC/ADO.NET). The driver library provides the connection

and session between the application (which might or might not execute on the same cluster) query and

the SQL engine layer.

In the SQL engine layer, a master query execution server process for each query prepares and executes

a query process. Depending on the specifics of the workload, it might involve a distributed transaction

manager, or one or more groups of executor server processes (ESPs) that execute portions of the query

plan in parallel. These groups of ESPs (for a given query, there might be zero or more groups), reflect

the degree of parallelism for the query.

The query can reference native HBase or Hive tables as well. Ultimately EsgynDB uses HDFS as the

storage layer foundation, with an appropriate replication factor (usually 3, but in some cloud

configurations 2 is the appropriate replication factor) to provide availability if a node fails.

Significant processes used for query processing include:

Process Name Description Distribution Count

DCS Master Initial connection point for locating a session-hosting mxosrvr

On one single node One active per cluster, often configured with a floating IP for high availability.

DCS Server Process that manages status and connection usage for mxosrvr

processes

On each node One for each node where mxosrvrs run.

Master executor

(mxosrvr)

Master executor process that hosts the SQL session, does query compilation and execution of root operator

Multiple on all data nodes in instance

Count defines maximum number of concurrent sessions.

Executor Server

Process (ESP)

Executes parallel fragments of SQL plans

Multiple run on all data nodes in cluster, in variable size groups.

Workload Dependent: determined by concurrent


parallel queries, query plan, and degree of parallelism.

DTM Maintains transactional state and log outcome information for transactions.

Runs on all data nodes in the instance.

One per data node.

For the EsgynDB Enterprise 2.0 version, cross datacenter support is implemented via DTM, which

communicates with Transaction Manager processes on the peer datacenter clusters to replicate the

transactions on both clusters.

The EsgynDB Manager architecture was simplified in the above architecture picture to show its

relationship to the query processing engine. The EsgynDB Manager subsystem architecture expands to

multiple processes as depicted in the following picture:

EsgynDB Manager processes include:

Process Name Description Distribution Count

DB Manager Web application server that browser connects with to

On one single node One per cluster on the first data node

OpenTSDB Lightweight service processes for collecting time-series metrics

On each node One per node

TCollectors Collection scripts that collect time-based metrics at interval

Multiple on all data nodes in instance; processes per node vary

System and HBase metrics are collected on each node EsgynDB metrics collected cluster-wide from a process on the first data node


REST Server Process that handles REST requests from on- and off-cluster clients

One per cluster One per cluster on the first data node

In addition to the listed processes used for query-processing and manageability, there are other

processes that are part of the EsgynDB stack supporting its runtime execution environment. These

processes generally use fewer resources and have little material impact on platform sizing and

provisioning.

EsgynDB Enterprise is integrated into the Hadoop ecosystem as depicted in the following picture:

The EsgynDB database engine uses HBase for storage services. As such, it relies on HBase

configuration and tuning to achieve optimal performance. EsgynDB cluster provisioning must incorporate

HBase configuration considerations.

HBase processes can be divided into two classes: control processes and data processes. Control

processes are one-offs that are involved in managing the HBase system and managing its metadata.

Data processes are processes that are involved in serving the data itself, including reading, updating, and

writing (HBase scan, get, and put operations).

HBase control processes include:

Process Description

HMaster Metadata and table creation/deletion

ZooKeeper Not an HBase process, but used for information management and coordination across nodes.

HBase data processes include:

Process Description

RegionServer Controls data serving, including servicing get/put, and separation of data into individual regions.

HBase in turn uses HDFS services for scalability, availability, and recovery (replication) within the cluster.

As such, EsgynDB cluster provisioning must also incorporate HDFS configuration considerations,

including replication. Control processes are singleton processes that manage the HDFS file system. In

HDFS, they control the location for individual data blocks. Data processes are involved in reading and

accessing that data.


HDFS control processes include:

Process Description

NameNode Manages the metadata files that are used to map blocks to individual files and select locations for replication.

Secondary NameNode Gets a checkpoint of all metadata from NameNode once per interval (hour is the default). This data can be used to recreate the block -> file mappings if the NameNode is lost. However, it is not simply a hot backup for the NameNode.

HDFS data processes include:

Process Description

DataNode Serves up reads and writes from individual files, and sends periodic I’m-alive messages, including files/blocks it is managing to the NameNode.

In addition to the HBase and HDFS control processes listed above, other control node processes include:

Process Description

Management Server Process Ambari, Cloudera Manager, etc. web page node. Some management servers do detailed database and analytic function.

In smaller clusters, control processes and data processes might reside on the same node. For larger

clusters, management processes have significantly different provisioning requirements and so are often

isolated on different nodes. The reference architecture assumes separate control and data nodes.

3 Capacity Planning

This section discusses issues and sizing recommendations to take into consideration when sizing an

EsgynDB Enterprise database.

3.1 Processing Usage

When sizing the processing power for an EsgynDB Enterprise cluster, consider the following:

In a typical high-performance configuration, nodes for management are configured separately

from data nodes. The two types are typically provisioned differently for storage (size,

configuration) as well as network and memory.

In a very small or test configuration, the distinction between data nodes and control nodes is

blurred, and most management processes are collocated with data processes for both

Hadoop/HBase and EsgynDB. So long as this configuration meets performance and availability

objectives, this configuration is valid, especially for basic development and test clusters.

Consider the following factors when assessing the required number of nodes:

More nodes with fewer cores is preferable to an equivalent number of cores spread over fewer

nodes, so long as the number of cores per node is reasonably modern (e.g., 8 or more) for typical

production workloads. Scaling out (increasing the number of nodes to achieve the desired

number of cores) is preferable to scaling up (increasing the cores per node to achieve the desired

number of cores) because:

o More nodes with fewer cores is typically cheaper than fewer nodes with more cores


o The domain of failure is smaller when losing a node or disk on a cluster with more nodes

o The available I/O bandwidth and parallelism is higher with more nodes.

Clusters smaller than 3 nodes are not advised, given HDFS replication requirements for

availability and recoverability.

The number of simultaneous users (concurrency) drives the number external corporate network-

connected nodes, as does the ingest rate for data arrival/refresh. This number determines the

total number of mxosrvr processes. The actual connections are distributed around the cluster

based on mxosrvr process distribution. Multiple mxosrvr processes can run on the same node.

Types of workloads are the other key considerations for number of nodes. The number of nodes

and cores reflects the amount of parallelism available for concurrent users of the applications

running on the cluster. If typical workloads are high-concurrency short queries, then thinner

nodes might be acceptable. If typical workloads involve large scans, then more processing power

is needed. Understand the types, frequency, plans, and typical concurrency for the application,

ideally via prototyping the workloads and queries whenever possible.

3.2 Memory Usage

When sizing an EsgynDB Enterprise cluster for memory usage, keep in mind the following considerations:

Many Hadoop ecosystem processes are Java processes. Due to memory efficiency

optimizations for the JVM, there is a significant restriction just below 32GB. Crossing this

threshold actually results in less usable memory because the internal representation of pointers

changes in a way that consumes significantly more space.

Large memory consumers for data nodes include:

o HDFS DataNode processes

o HBase RegionServers

Among control processes, the large memory consumers are:

HDFS NameNode processes

Plan for these processes to use a heap size of 16-32GB each for optimal performance on a large

cluster. Reducing the memory for these components affects performance significantly, so do

careful tuning and analysis before choosing a smaller value.

The primary users of memory in the EsgynDB database engine are the mxosrvrs. For each

concurrent connection on a node, plan for 512MB (0.5 GB) per connection per node.

3.3 Disk Usage

When sizing an EsgynDB Enterprise cluster for disk usage, keep in mind the following considerations:

For data nodes, SSD is only beneficial for high concurrency write. In general HDD is sufficient.

For control nodes, SSD is similarly not cost effective – the goal is to have most control

information cached in memory.

For data nodes, HDD data disks configure disks as direct attached storage in a JBOD (Just a

Bunch of Disks) configuration. RAID striping slows down HDFS and actually reduces

concurrency and recoverability. For control nodes, data disks can be configured as either JBOD

or RAID1 or RAID10.


As with processing power, disks are a unit of parallelism. For a given total-disk-per-node value, if

workloads include many large scans, it is often most effective to have more smaller disks than

fewer larger disks per node on data nodes. The reference architecture assumes that most

workloads include large scans.

HBase SNAPPY or GZ compression is strongly suggested. SNAPPY has less CPU overhead, but

GZ compresses better. Degree of compression varies widely depending on the data and workload

patterns, but generally accepted calculations suggest around a 30%-40% reduction, depending

on data. Compression adds to the path length for reading and writing, which can have an effect

on data growth and ingest. Compression happens at the HBase file block level, limiting the

amount of un-compression required at read time.

When calculating overall disk space and data disk space per node, be sure to account for working

space and anticipated ingest/outflow per node. Also remember that blocks of an HDFS file come

with a replication factor (typically set to 3, so 3 copies of the data). That means that each 10 GB

file actually occupies 30GB on disk. Esgyn recommends leaving approximately 33% of disk

space free for overhead workspace.

3.4 Network Usage

When sizing an EsgynDB Enterprise cluster for network usage, keep in mind the following considerations:

In general, 10GigE is the standard for networking for data traffic within an EsgynDB cluster.

Using a slower network for data flow can significantly impact performance. 2 bonded 10GigE

networks provide more throughput for I/O intensive applications.

In some cases, a second slower network is configured for cluster (not Hadoop/HBase)

maintenance in order to keep that traffic separate from the operational data workflow.

Consider failure scenarios when connecting nodes from different racks together. HDFS block

placement algorithm is biased to select nodes on at least 2 different racks for a block’s location if

the replication factor is 3 or greater.

If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, there must be a high-

speed connection between the two data centers.

If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, both clusters must be

configured so that the application can actively connect to either peer cluster via EsgynDB drivers

when both are running and accessible. This capability ensures that the application can access

either cluster exclusively in case of loss of communications with one of the two.

4 Reference Architecture Guidance for Production Bare Metal Cluster

This section contains recommendations for hardware configurations and software provisioning for a bare

metal EsgynDB cluster. The recommendations are hardware-independent. Check with your hardware

vendor for specific part numbers and availability/timeliness.

The configuration described is for a medium or large EsgynDB installation, with separate control and data

nodes. Smaller configurations with all processes on the same nodes are covered in separate section.

For Data Nodes, the basic hardware recommendation for each node is:


Resource Recommendation

CPU Intel XEON or AMD 64-bit processors 8≤ Number of cores per node ≤16

Memory 64GB for overall Hadoop ecosystem and query processing plus usual overhead plus 0.5GB for each mxosrvr on the node. To calculate the number of mxosrvr processes:

𝑚𝑎𝑥 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠

64GB ≤ Memory Size ≤ 128GB. Most common value is 96GB.

Network 10GigE, 1GigE, or 2x10GigE Bonded

Storage SATA or SAS or SSD, typically 12-24 1TB disks configured in a JBOD configuration

For Control Nodes, the basic hardware recommendation for each node is:

Resource Recommendation

CPU Intel XEON or AMD 64-bit processors 8≤ Number of cores per node ≤16

Memory 64GB for overall Hadoop ecosystem and query + overhead for swapping and process maintenance as possible/desired. 64GB ≤ Memory Size ≤ 128GB. Most common value is 96GB.

Network 10GigE, 1GigE, or 2x10GigE Bonded, plus appropriate switches for off platform to on platform.

Storage SATA or SAS or SSD, typically 6-12 1TB disks configured in a RAID1 or RAID10 configuration

4.1 Medium/Large Deployment

A medium or large deployment uses the specifications above including both control and management

nodes. Processes are placed in these nodes as depicted in the following figure:


In the above picture, the control nodes flank the data nodes and are only used for the DCS master

process. There’s no specific constraint for node naming conventions, including no assumption that nodes

are consecutively numbered. The vertical bars represent individual nodes, and the ovals represent

processes within the node.

4.2 Small Deployment

For a small (2-3 node, typically less than one rack) deployment, the control nodes are collapsed into the

regular node infrastructure as follows:

In the above picture, the control nodes have been removed and control processes run on the same nodes

as the functional processes.


5 Cloud Deployment

When deploying EsgynDB in a cloud environment such as Amazon’s AWS, use the guidelines above to

provision resources. For configuration use HDFS replication factor 3 if you choose instance local store for

the file system, otherwise use HDFS replication factor 2 if you use EBS volumes.

6 Conclusion

This EsgynDB Platform Reference Architecture document serves as a starting point for defining the

platform to build an EsgynDB cluster where EsgynDB is the primary purpose for the cluster. It also is

intended to assist application developers and users in planning the deployment strategy for EsgynDB

applications. Esgyn recommends consulting with an Esgyn technical resource to get additional

information, training, and guidance.

Documents

EsgynDB Enterprise 2.0 Platform Reference Architecture · PDF fileThe architecture involves one or more clients concurrently using SQL queries to access the data ... its runtime execution