Upload
vanphuc
View
214
Download
1
Embed Size (px)
Citation preview
06Dec2015 © 2015 Esgyn Corporation 1
EsgynDB Enterprise 2.0 Platform Reference Architecture
This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion™ (Incubating) implementation with licensed support and extensions. It outlines server configurations for various vendors and attempts to describe some of the considerations for sizing an EsgynDB Application
1 INTRODUCTION ................................................................................................................................... 1
2 ARCHITECTURE ................................................................................................................................... 1
3 CAPACITY PLANNING ......................................................................................................................... 5
3.1 PROCESSING USAGE ......................................................................................................................... 5 3.2 MEMORY USAGE ............................................................................................................................... 6 3.3 DISK USAGE ..................................................................................................................................... 6 3.4 NETWORK USAGE ............................................................................................................................. 7
4 REFERENCE ARCHITECTURE GUIDANCE FOR PRODUCTION BARE METAL CLUSTER .......... 7
4.1 MEDIUM/LARGE DEPLOYMENT ........................................................................................................... 8 4.2 SMALL DEPLOYMENT ......................................................................................................................... 9
5 CLOUD DEPLOYMENT ...................................................................................................................... 10
6 CONCLUSION ..................................................................................................................................... 10
1 Introduction
The Apache Trafodion (Incubating) project provides a full transactional SQL database integrated into the
Apache Hadoop™ ecosystem to support operational workloads. EsgynDB Enterprise 2.0, built on Apache
Trafodion, offers a fully-supported, enterprise-ready version with extensions for additional features,
including cross datacenter support in EsgynDB Enterprise Advanced 2.0.
The reference architecture description describes a purpose-built EsgynDB Enterprise installation.
Specifically it describes the architecture and provisioning for a cluster whose purpose is running one or
more EsgynDB application workload(s).
This reference architecture does not describe a configuration where EsgynDB Enterprise is part of a
wider Hadoop cluster running other ecosystem applications such as MapReduce. Clusters running mixed
workloads can start from sizing/provisioning information here. But the final sizing/provisioning must also
incorporate requirements from the other workloads in the cluster. As such it is beyond the scope of this
document.
2 Architecture
Apache Trafodion provides an enterprise-class, web scale database engine in the Hadoop ecosystem. In
addition, Trafodion enables SQL query language and transactional semantics for native Apache HBase™
and Apache Hive™ tables. Trafodion provides transactional support for data stored in HBase. It
supports fully distributed ACID transactions across multiple statements, tables, and rows, which enables
06Dec2015 © 2015 Esgyn Corporation 2
EsgynDB Enterprise to support operational workloads that are generally beyond most Hadoop ecosystem
components.
EsgynDB Enterprise Release 2.0 extends Apache Trafodion by providing additional features such as
cross datacenter support, using the architecture depicted below
The architecture involves one or more clients concurrently using SQL queries to access the data
managed by EsgynDB via a driver (ODBC/JDBC/ADO.NET). The driver library provides the connection
and session between the application (which might or might not execute on the same cluster) query and
the SQL engine layer.
In the SQL engine layer, a master query execution server process for each query prepares and executes
a query process. Depending on the specifics of the workload, it might involve a distributed transaction
manager, or one or more groups of executor server processes (ESPs) that execute portions of the query
plan in parallel. These groups of ESPs (for a given query, there might be zero or more groups), reflect
the degree of parallelism for the query.
The query can reference native HBase or Hive tables as well. Ultimately EsgynDB uses HDFS as the
storage layer foundation, with an appropriate replication factor (usually 3, but in some cloud
configurations 2 is the appropriate replication factor) to provide availability if a node fails.
Significant processes used for query processing include:
Process Name Description Distribution Count
DCS Master Initial connection point for locating a session-hosting mxosrvr
On one single node One active per cluster, often configured with a floating IP for high availability.
DCS Server Process that manages status and connection usage for mxosrvr
processes
On each node One for each node where mxosrvrs run.
Master executor
(mxosrvr)
Master executor process that hosts the SQL session, does query compilation and execution of root operator
Multiple on all data nodes in instance
Count defines maximum number of concurrent sessions.
Executor Server
Process (ESP)
Executes parallel fragments of SQL plans
Multiple run on all data nodes in cluster, in variable size groups.
Workload Dependent: determined by concurrent
06Dec2015 © 2015 Esgyn Corporation 3
parallel queries, query plan, and degree of parallelism.
DTM Maintains transactional state and log outcome information for transactions.
Runs on all data nodes in the instance.
One per data node.
For the EsgynDB Enterprise 2.0 version, cross datacenter support is implemented via DTM, which
communicates with Transaction Manager processes on the peer datacenter clusters to replicate the
transactions on both clusters.
The EsgynDB Manager architecture was simplified in the above architecture picture to show its
relationship to the query processing engine. The EsgynDB Manager subsystem architecture expands to
multiple processes as depicted in the following picture:
EsgynDB Manager processes include:
Process Name Description Distribution Count
DB Manager Web application server that browser connects with to
On one single node One per cluster on the first data node
OpenTSDB Lightweight service processes for collecting time-series metrics
On each node One per node
TCollectors Collection scripts that collect time-based metrics at interval
Multiple on all data nodes in instance; processes per node vary
System and HBase metrics are collected on each node EsgynDB metrics collected cluster-wide from a process on the first data node
06Dec2015 © 2015 Esgyn Corporation 4
REST Server Process that handles REST requests from on- and off-cluster clients
One per cluster One per cluster on the first data node
In addition to the listed processes used for query-processing and manageability, there are other
processes that are part of the EsgynDB stack supporting its runtime execution environment. These
processes generally use fewer resources and have little material impact on platform sizing and
provisioning.
EsgynDB Enterprise is integrated into the Hadoop ecosystem as depicted in the following picture:
The EsgynDB database engine uses HBase for storage services. As such, it relies on HBase
configuration and tuning to achieve optimal performance. EsgynDB cluster provisioning must incorporate
HBase configuration considerations.
HBase processes can be divided into two classes: control processes and data processes. Control
processes are one-offs that are involved in managing the HBase system and managing its metadata.
Data processes are processes that are involved in serving the data itself, including reading, updating, and
writing (HBase scan, get, and put operations).
HBase control processes include:
Process Description
HMaster Metadata and table creation/deletion
ZooKeeper Not an HBase process, but used for information management and coordination across nodes.
HBase data processes include:
Process Description
RegionServer Controls data serving, including servicing get/put, and separation of data into individual regions.
HBase in turn uses HDFS services for scalability, availability, and recovery (replication) within the cluster.
As such, EsgynDB cluster provisioning must also incorporate HDFS configuration considerations,
including replication. Control processes are singleton processes that manage the HDFS file system. In
HDFS, they control the location for individual data blocks. Data processes are involved in reading and
accessing that data.
06Dec2015 © 2015 Esgyn Corporation 5
HDFS control processes include:
Process Description
NameNode Manages the metadata files that are used to map blocks to individual files and select locations for replication.
Secondary NameNode Gets a checkpoint of all metadata from NameNode once per interval (hour is the default). This data can be used to recreate the block -> file mappings if the NameNode is lost. However, it is not simply a hot backup for the NameNode.
HDFS data processes include:
Process Description
DataNode Serves up reads and writes from individual files, and sends periodic I’m-alive messages, including files/blocks it is managing to the NameNode.
In addition to the HBase and HDFS control processes listed above, other control node processes include:
Process Description
Management Server Process Ambari, Cloudera Manager, etc. web page node. Some management servers do detailed database and analytic function.
In smaller clusters, control processes and data processes might reside on the same node. For larger
clusters, management processes have significantly different provisioning requirements and so are often
isolated on different nodes. The reference architecture assumes separate control and data nodes.
3 Capacity Planning
This section discusses issues and sizing recommendations to take into consideration when sizing an
EsgynDB Enterprise database.
3.1 Processing Usage
When sizing the processing power for an EsgynDB Enterprise cluster, consider the following:
In a typical high-performance configuration, nodes for management are configured separately
from data nodes. The two types are typically provisioned differently for storage (size,
configuration) as well as network and memory.
In a very small or test configuration, the distinction between data nodes and control nodes is
blurred, and most management processes are collocated with data processes for both
Hadoop/HBase and EsgynDB. So long as this configuration meets performance and availability
objectives, this configuration is valid, especially for basic development and test clusters.
Consider the following factors when assessing the required number of nodes:
More nodes with fewer cores is preferable to an equivalent number of cores spread over fewer
nodes, so long as the number of cores per node is reasonably modern (e.g., 8 or more) for typical
production workloads. Scaling out (increasing the number of nodes to achieve the desired
number of cores) is preferable to scaling up (increasing the cores per node to achieve the desired
number of cores) because:
o More nodes with fewer cores is typically cheaper than fewer nodes with more cores
06Dec2015 © 2015 Esgyn Corporation 6
o The domain of failure is smaller when losing a node or disk on a cluster with more nodes
o The available I/O bandwidth and parallelism is higher with more nodes.
Clusters smaller than 3 nodes are not advised, given HDFS replication requirements for
availability and recoverability.
The number of simultaneous users (concurrency) drives the number external corporate network-
connected nodes, as does the ingest rate for data arrival/refresh. This number determines the
total number of mxosrvr processes. The actual connections are distributed around the cluster
based on mxosrvr process distribution. Multiple mxosrvr processes can run on the same node.
Types of workloads are the other key considerations for number of nodes. The number of nodes
and cores reflects the amount of parallelism available for concurrent users of the applications
running on the cluster. If typical workloads are high-concurrency short queries, then thinner
nodes might be acceptable. If typical workloads involve large scans, then more processing power
is needed. Understand the types, frequency, plans, and typical concurrency for the application,
ideally via prototyping the workloads and queries whenever possible.
3.2 Memory Usage
When sizing an EsgynDB Enterprise cluster for memory usage, keep in mind the following considerations:
Many Hadoop ecosystem processes are Java processes. Due to memory efficiency
optimizations for the JVM, there is a significant restriction just below 32GB. Crossing this
threshold actually results in less usable memory because the internal representation of pointers
changes in a way that consumes significantly more space.
Large memory consumers for data nodes include:
o HDFS DataNode processes
o HBase RegionServers
Among control processes, the large memory consumers are:
HDFS NameNode processes
Plan for these processes to use a heap size of 16-32GB each for optimal performance on a large
cluster. Reducing the memory for these components affects performance significantly, so do
careful tuning and analysis before choosing a smaller value.
The primary users of memory in the EsgynDB database engine are the mxosrvrs. For each
concurrent connection on a node, plan for 512MB (0.5 GB) per connection per node.
3.3 Disk Usage
When sizing an EsgynDB Enterprise cluster for disk usage, keep in mind the following considerations:
For data nodes, SSD is only beneficial for high concurrency write. In general HDD is sufficient.
For control nodes, SSD is similarly not cost effective – the goal is to have most control
information cached in memory.
For data nodes, HDD data disks configure disks as direct attached storage in a JBOD (Just a
Bunch of Disks) configuration. RAID striping slows down HDFS and actually reduces
concurrency and recoverability. For control nodes, data disks can be configured as either JBOD
or RAID1 or RAID10.
06Dec2015 © 2015 Esgyn Corporation 7
As with processing power, disks are a unit of parallelism. For a given total-disk-per-node value, if
workloads include many large scans, it is often most effective to have more smaller disks than
fewer larger disks per node on data nodes. The reference architecture assumes that most
workloads include large scans.
HBase SNAPPY or GZ compression is strongly suggested. SNAPPY has less CPU overhead, but
GZ compresses better. Degree of compression varies widely depending on the data and workload
patterns, but generally accepted calculations suggest around a 30%-40% reduction, depending
on data. Compression adds to the path length for reading and writing, which can have an effect
on data growth and ingest. Compression happens at the HBase file block level, limiting the
amount of un-compression required at read time.
When calculating overall disk space and data disk space per node, be sure to account for working
space and anticipated ingest/outflow per node. Also remember that blocks of an HDFS file come
with a replication factor (typically set to 3, so 3 copies of the data). That means that each 10 GB
file actually occupies 30GB on disk. Esgyn recommends leaving approximately 33% of disk
space free for overhead workspace.
3.4 Network Usage
When sizing an EsgynDB Enterprise cluster for network usage, keep in mind the following considerations:
In general, 10GigE is the standard for networking for data traffic within an EsgynDB cluster.
Using a slower network for data flow can significantly impact performance. 2 bonded 10GigE
networks provide more throughput for I/O intensive applications.
In some cases, a second slower network is configured for cluster (not Hadoop/HBase)
maintenance in order to keep that traffic separate from the operational data workflow.
Consider failure scenarios when connecting nodes from different racks together. HDFS block
placement algorithm is biased to select nodes on at least 2 different racks for a block’s location if
the replication factor is 3 or greater.
If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, there must be a high-
speed connection between the two data centers.
If using the cross datacenter feature in EsgynDB Enterprise Advanced 2.0, both clusters must be
configured so that the application can actively connect to either peer cluster via EsgynDB drivers
when both are running and accessible. This capability ensures that the application can access
either cluster exclusively in case of loss of communications with one of the two.
4 Reference Architecture Guidance for Production Bare Metal Cluster
This section contains recommendations for hardware configurations and software provisioning for a bare
metal EsgynDB cluster. The recommendations are hardware-independent. Check with your hardware
vendor for specific part numbers and availability/timeliness.
The configuration described is for a medium or large EsgynDB installation, with separate control and data
nodes. Smaller configurations with all processes on the same nodes are covered in separate section.
For Data Nodes, the basic hardware recommendation for each node is:
06Dec2015 © 2015 Esgyn Corporation 8
Resource Recommendation
CPU Intel XEON or AMD 64-bit processors 8≤ Number of cores per node ≤16
Memory 64GB for overall Hadoop ecosystem and query processing plus usual overhead plus 0.5GB for each mxosrvr on the node. To calculate the number of mxosrvr processes:
𝑚𝑎𝑥 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑜𝑛𝑠
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠
64GB ≤ Memory Size ≤ 128GB. Most common value is 96GB.
Network 10GigE, 1GigE, or 2x10GigE Bonded
Storage SATA or SAS or SSD, typically 12-24 1TB disks configured in a JBOD configuration
For Control Nodes, the basic hardware recommendation for each node is:
Resource Recommendation
CPU Intel XEON or AMD 64-bit processors 8≤ Number of cores per node ≤16
Memory 64GB for overall Hadoop ecosystem and query + overhead for swapping and process maintenance as possible/desired. 64GB ≤ Memory Size ≤ 128GB. Most common value is 96GB.
Network 10GigE, 1GigE, or 2x10GigE Bonded, plus appropriate switches for off platform to on platform.
Storage SATA or SAS or SSD, typically 6-12 1TB disks configured in a RAID1 or RAID10 configuration
4.1 Medium/Large Deployment
A medium or large deployment uses the specifications above including both control and management
nodes. Processes are placed in these nodes as depicted in the following figure:
06Dec2015 © 2015 Esgyn Corporation 9
In the above picture, the control nodes flank the data nodes and are only used for the DCS master
process. There’s no specific constraint for node naming conventions, including no assumption that nodes
are consecutively numbered. The vertical bars represent individual nodes, and the ovals represent
processes within the node.
4.2 Small Deployment
For a small (2-3 node, typically less than one rack) deployment, the control nodes are collapsed into the
regular node infrastructure as follows:
In the above picture, the control nodes have been removed and control processes run on the same nodes
as the functional processes.
06Dec2015 © 2015 Esgyn Corporation 10
5 Cloud Deployment
When deploying EsgynDB in a cloud environment such as Amazon’s AWS, use the guidelines above to
provision resources. For configuration use HDFS replication factor 3 if you choose instance local store for
the file system, otherwise use HDFS replication factor 2 if you use EBS volumes.
6 Conclusion
This EsgynDB Platform Reference Architecture document serves as a starting point for defining the
platform to build an EsgynDB cluster where EsgynDB is the primary purpose for the cluster. It also is
intended to assist application developers and users in planning the deployment strategy for EsgynDB
applications. Esgyn recommends consulting with an Esgyn technical resource to get additional
information, training, and guidance.