Rankwave MOMENT™ (English)

MOMENT™

Solution Descriptions

1

Contents

• Features of MOMENT™ • Architecture • System Topology • Performance • Roadmap

2

Features of MOMENT™

3

4

Features

- In-memory computing

■ As fast as possible

- Auto scaling based on cloud system - Auto aging data - Auto load balancing

■ Easily scalable

- Direct query from cluster - Easy query language - Streaming data query

■ Real-time query

- Replication

■ Failover

5

As fast as possible

When we compute something with Von Neumann Architecture, a computing process will happen between CPU and Memory. Therefore, if you wish the Process calculates as fast as possible, data must be located not in File System but in Memory. We are still using File System though because the data loaded on Memory is gone as soon as Process stops However, let’s think realistically. How often do you stop database system when you operate a service in a stable phase? If you can take the initializing time to load data on Memory, loading all the data to be computed on Memory will be the best choice to maximize computing speed that. Users can hardly recognize the initializing time to load data on Memory. They don’t concern data loading time which system operators mostly do. They only concern how fast the system compute and how fast they can get result from that.

■ Overview

(1/5)

File System

6

As fast as possible

Data #1

Process

Data Memory

Data #2 Data #n-1 Data #n

File I/O

Operating Data #1 Operating Data #N

File I/O

In case of File System, File I/O (loading data from File System onto Data Memory) must be performed as many as the number of data. In most cases of big data analysis, the number of data to be computed almost reach the number of all data. In other words, n in the equation below is the number of all data in most cases.

…

𝐿𝑇 = 𝐷𝑇(𝑖)

𝑛

𝑖=1

+ 𝑃𝑇(𝑖)

𝑛

𝑖=1

LT : lead time DT : data loading time PT : data processing time

■ Base on file system

(2/5)

File System

7

As fast as possible

Data #1

Process

Data Memory

Data #2 Data #n-1 Data #n …

𝐿𝑇 = 𝑃𝑇(𝑖)

𝑛

𝑖=1

IT : Initializing time LT : lead time DT : data loading time PT : data processing time

𝐼𝑇 = 𝐷𝑇(𝑖)

𝑛

𝑖=1

Data #1 Data #2 Data #N-1

Data #N

…

Loading all data into memory

Initializing stage need time to load data of File System onto Memory. After that, the time is no longer necessary to compute the data.

■ In-memory type

(3/5)

8

■ Memory is expensive?

With emerging of 64bits system, available memory capacity of the system has been increased, the price of memory has become affordable enough. However, we are still concerning the “Cost” if we have to load all the data onto Memory. To use Memory, we have to pay 100 times more than Hard Disk and 10 times more than Solid State Disk. However, comparison of the costs must be made under the conditions that meet the desired performance. Now, SSD has about 500Mbps I/O at 10 times lower price than Memory. It means We cannot help using distributed processing to handle Tera byte data because it can handle only up to 500 mega bytes per second so that it does not meet the desired performance. (To handle 1 Tera byte with equipment using SSD, it requires at least 2,097 sec. 1T / 500Mbps = 2,097 sec) If the equipment has 64bits system, it can use 16 Exa bytes Memory in theory. But, operating systems actually allow 1~2 Tera bytes and cloud services provide only up to 200 Giga bytes per equipment(in AWS case.). So, we have to use multiple equipment in order to load data of 1 Tera byte onto Memory. When we want to build a distributed process system handling 1 Tera byte data in 1 minute, the required number of equipment is as follows.

To use In-Memory type, we have to spend about $1.5 / hour more than File System. It might not be precise because we did not consider data processing time and CPU performance level yet. Nevertheless, In-Memory type is not much more expensive than File System when we consider data handling performance. Moreover, we have to pay in accordance with the number of File I/O times in AWS. so, we can not be sure that File System is cheaper than In-Memory type.

As fast as possible (4/5)

File System In-Memory

Number of equipment 30 = 1T / 500Mbps / 60 (using AWS m3.xlarge) 5 = 1T / 200G (using AWS r3.8xlarge)

Cost 30 * $ 0.532 (/h) = $ 15.96 / hour 5 * $ 3.5 (/h) = $ 17.5 / hour

9

As fast as possible

𝐿𝑇 = 𝐷𝑇(𝑖)

𝑛

𝑖=1

+ 𝑃𝑇(𝑖)

𝑛

𝑖=1

LT : lead time DT : data loading time PT : data processing time

𝐿𝑇 = 𝑃𝑇(𝑖)

𝑛

𝑖=1

Base on file system In-memory type

>

Big data analysis is a process of getting meaningful results from large amounts of data. That means value of n is always big. If a system has 100 million units of data, the system must calculate almost 1 million units of data each computing time. So, loading data onto Memory in advance will be not a waste of Memory but an optimal choice to speed up computing time. For example, let’s say it takes 0.00001 second to load one unit of data onto Memory. (It might take more time in most cases) If it calculates 100 million units of data, it need 16 minutes (0.00001 * 100000000 = 1000 second) more at each computing time. Which do you want? Waiting 16 minutes for one time to initialize the system or waiting 16 minutes for each computing time to get response from the system? In conclusion, MOMENT™ chooses In-Memory computing in order to maximize computing speed.

■ Conclusion : In-memory computing for performance

(5/5)

10

Easily Scalable

■ Overview

One of difficult things in operating big data system is setting the optimal system size according to data size and required performance and then controlling the size of system when needed. In most cases, system expansion or reduction needs data aging (or deleting). It should be operated by not human resources but automatic management programming because the data is so big. In case of data aging, we usually need much longer time on the data aging work whereas we can stop operating a service only for very short time . Therefore, we need automatic data aging management without stopping services. Shorter cycle of data aging will be better if it has no influence on system performance. Auto scaling needs basically automatic system expansion or reduction. To use a cloud system would be proper way for that. Automatic data aging means to follow a policy of automatic data deletion. Even though it is not so difficult technologically, it need automatic data distribution management because automatic data deletion can cause unbalanced distribution of data in distributed processing system. So, we also need automatic load balancing management.

11

Easily Scalable

■ Auto scaling based on cloud system

We can control system size according to the amount of data automatically with using a cloud service. MOMENT™ self-determines the system size and then set up the required equipment for the optimal system operation based on AWS (Amazon Web Services). MOMENT™ consists of Central Server and Node Server. Central Server manages Memory Usage Table of Node Server. Central Server decides whether to expand or reduce the system according to the Memory Usage.

Data Manipulation

Queue Data Allocator

Central Server

Table Manager

Partition Manager

Table Manager

Partition Manager

Node Server #1

Insert, Update, Delete Data

Node Server #10

…

Node Memory Usage

Node#1 54%

Node#2 51%

…

Node#10 48%

Memory Usage Table

AWS EC2 SDK

12

Easily Scalable

■ Auto aging data

System resources are always limited. So, we need to delete meaningless old data periodically for resource efficiency. MOMENT™ can handle data aging without stopping system because its data storage section is separated from the data inquiry section. Data aging in Memory can be handled much faster than in File System and its system load is not large. It is possible to setup auto aging by Table (which manages data) configuration. Data aging in Memory is performed when 1) new data is added, 2) loaded data is updated, and 3) cyclical batch process works. For Raw data aging on File System, it first identifies data aging objects when it performs memory data aging. And then it writes them on aging queue and finally it deletes them by cyclical batch process.

Table Manager

Partition Manager

Data Manipulation

Queue File System

Data Aging Queue

for File system

Insert, Update Data

Aging?

Aging data from file system

Batch Job

Aging data from memory

13

Easily Scalable

■ Auto load balancing

Even if Central Server allocates data into Node Server according to the Memory Usage, Memory Usage of specific Node Servers might increase after several data updates and repeated data aging. Sometimes, repeated data update can cause excess of the available capacity of assigned Node Server. It needs to transfer data that exceeds the capacity to other Node Servers. MOMENT™ provides a function which re-allocates the data location automatically. Central Server monitors the Memory Usage of each Node Server. When the Memory Usage of a specific Node Server is getting higher than the average usage, it automatically re-allocates a portion of data in the specific Node Server to other Node Servers.

Data Allocator

Central Server

Node Memory Usage

Node#1 54%

Node#2 90%

…

Node#10 48%

Memory Usage Table

Table Manager

Partition Manager

Node Server #2

Broken balance?

[Batch job] Checking the balance of memory usage

Give up your data!

Request re-allocation

14

Real-time Query

■ Overview

Map and Reduce (an analysis framework for Hardoop) is an outstanding framework conceptually. However, it should make Map and Reduce according to the format of Raw Data. It is not convenient to use because Map and Reduce module has to be changed according to the change of requirements. Moreover, it can get desired results only after performing Map and Reduce to all the data in database. This is not good for Real-time† processing. To complement this disadvantage, people start to use Hbase which is column basis NoSQL with using Hardoop framework and it allows Real-time processing. We concerned the same issue while designing MOMENT™. To address this issue, we designed that it can store data in column with using Memory Map most of all. We assigned data analysis module in each cluster in order to minimize the number of data inquiry among the clusters for the Real-time query. Data analysis module in each cluster inquires into local data and then creates the result to query. Real-time Query of MOMENT™ is as follows : 1) For efficient data analysis, it reads local data directly. (It does not inquire data through cluster-cluster communication) 2) It enables data query not by Query with programming but by Query language similar to SQL. 3) It provides a function which enables repeated query into streaming data.

[Real-time] How much time does Real-time mean? Architect of Impala mentioned that as follows : “It's when you sit and wait for it to finish, as opposed to going for a cup of coffee or even letting it run overnight. That's real time”

15

Real-time Query

■ Direct query from cluster

Node Servers in MOMENT™ have Tables and each Table has several Partitions. It doesn’t perform Query from the view of all data. It performs Query in each Partition in each Table of a number of Nodes in parallel. After that, it combines the results from bottom to top over and over. MOMENT™ makes the response time short enough to the level of real-time in this way. For this reason, the query must be performed in the Partition level and each Node should have a module to perform query. MOMENT™ can minimize the number of network communications because each cluster performs query with direct inquiry into local data.

Central Server

Table #1

Partition #1

Node Server #1

Partition #100 …

Synthesizer

Query job for each partition

Query Processor (Thread pool)

Synthesizer Query Processor



Node Server #2

Node Server #3


Node Server #10

Broadcast query job to sub nodes

…

16

Real-time Query

■ Easy query language

MOMENT™ supports Query Syntax similar to SQL and it provides Real-time Query function. You can write the query by JSON format and we will support ANSI/ISO SQL Standard compliance soon.

{ “query”: { “select” : [ { … } , { … } ], “from” : “ Table Name “, “where” : “ Retrieving condition “ }, “receiver” : { “data_format” : “json_array”, “type” : “direct_http”, “url_format” : “ … “ } }

The name of Table which you want to inquire is assigned to “from” object and the inquiry condition is assigned to “where” object. It supports logical operator (|, &) and comparison operator ( >, <, =, <=, >=). Priority can be specified with parentheses. It can specify multiple queries simultaneously with same “from” and “where”. “select” object is defined as “list” which inquires the list, “count” which counts numbers of something in the list, and “sum type” which sum the values in the created list.

An object which defines how to send the result of Query. It supports 4 types of receiver in considering that the size of the Query result can be very large. 1) Storing the result onto File System, then calling the information for file inquiry by HTTP. 2) Storing the result onto File System, then pushing the information for file inquiry on Queue. 3) Sending the result by HTTP Parameter. 4) Sending the result by TCP / IP Packet.

17

Real-time Query

■ Query Syntax

select Object Type Multi query with JSON Array

query_key String Unique value in query for distinguishing whether the result for any query when it receives the result for return.

type list column String Array Information of column for searching the result.

order_by String Column to be sorted.

sort Number Sort by ascending / descending.

limit Number Maximum numbers of the result by query.

count group_by column String Information of column for grouping.

sort Number Sorting options when grouping.

limit Number Maximum numbers for grouping.

target String Array List of values for grouping.

distinct String Name of column for distinct.

sum group_by column String Information of column for grouping.

sort Number Sorting options when grouping.

limit Number Maximum numbers for grouping.

distinct String Name of column for distinct.

column String Name of column for summation.

18

Real-time Query

■ Streaming data query

MOMENT™ provides not only fast inquiry into accumulated data but also real-time filtering on continuously collected data. It also provides interfaces for registration, deletion, modification, and starting / stopping filters. It can filter the receiving data with desired conditions and transfer the filtered data to other system.

Data Queue

Streaming data source Streaming Filter Processor

Synthesizer

Table Manager

Query Processor

Data Allocator

Filter Client Regist, Unregist, Modify filters Start filter, Stop filter

Query Processor

Synthesizer

① Data Allocation

② Processing Query

③ Filtered result notification

④ Clearing data

②

Central Server

Node Server #1 (Extensible)

19

Failover

■ Overview

MOMENT™ doesn’t replicate raw data because it has separate raw data storage and it loads raw data onto Memory at system initialization. (In case of AWS S3, it guarantees data integrity) For the part that initialization is not possible when data loss is caused inside MOMENT™ system, MOMENT™ replicates core data in separate system for Failover. It has disadvantage that it takes long time to initialize the system with loading allocated data onto Memory because it uses In-Memory computing. To overcome this, it also store the data to be loaded into the Local File DB. This File DB can be setup to be replicated in separate system. What it replicates for Failover is as follows. 1) Data allocation information of Central Server. 2) Raw data path of data to be loaded. 3) Hash value (CRC32) of loaded data. 4) Local Cache DB of loaded data.

20

Failover

■ Replication

MOMENT™ uses Oracle Berkeley DB for Local File DB and Berkeley DB can be replicated through online as follows. Event which comes into Master is conveyed to Event queue in Remote system by Replication Manager. The event is reflected in the local database after it receives a response to the process of the Slave. In case that Master doesn’t work, Database Access module sends request of Client to Slave. In case Slave receives not Replication Event but Insert, Update, and Delete Event, it runs queueing the Events in order to sync with Master.

Master Slave

Berkeley DB

Event Handler

Replication Manager

Berkeley DB

Event Handler

Replication Manager

Skip if it is Replication Event

Request for Replication

Replication Completion Put Data Put Data

Event queue Event queue

Database Access Module Put data Event queue

Architecture

21

22

Layers - Data

Data Source

Data Management

AWS S3 AWS DynamoDB MySQL MongoDB

Job Queue Service Berkeley DB Library

Data Allocator Table Manager Partition Manager Aging checker Load checker

Data Sync. Client

Data Source Gateway

SGW MyGW DyGW

AWS SDK JDBC

MoGW Remote BDB

Service

Data Allocator : Allocating Node Server that data is to be loaded onto Memory. Load checker : Checking Memory Usage and managing each system load on the Node Servers equally. Table Manager : Managing schema information, allocating data into Partitions, and processing Insert, update, delete of data. Partition Manager : Managing Memory map. Aging checker : Checking and determining which data to be aging on the change time point of the data. Checking periodically if there is no change in data. Remote BDB Service : A service which allow BDB (Local DB) access via network. Also replicate BDB for Failover. SGW, MyGW, DyGW, MoGW : Gateway servers to access raw data. Job Queue Service : Queue service for sequential process. Available for not only local access but also remote access via network. Providing Push/Pop of Jobs which is defined by JSON format.

23

Layers - Query

Query Management

Application Query Client C/C++ Lib. Java Lib. TCP/IP

Job Queue Service Thread Pool Berkeley DB Library

Query Processor Synthesizer

Filter Client C/C++ Lib. Java Lib. TCP/IP

Cache Manager Query Parser

Query Client : A client which sends Query to inquire into data and receives the result to the Query. Filter Client : A client which assigns Filters to Streaming Data and receives the filtered result. Query Processor : Broadcasting query to the child nodes, merging the result with synthesizer, and then sending the result to query or filter requestor. Synthesizer : A module which creates the final result with merging the result of Query performed at Partition or Node level. Cache Manager : Creating cache to the result to Query, managing the expiration of cache, and determining the existence of a cache. (To avoid unnecessary repetition of the same Query). Query Parser : A module which checks Query Syntax made by JSON format and extract the information of Query to be performed. Thread Pool : A framework for processing multi-Job simultaneously.

System topology

24

25

MOMENT™ System

CMM

Central Server

MM

Node Server #1

MM

Node Server #2

MM

Node Server #n

QM

Queue Manager Server

MC

MOMENT Cache (Remote BDB Service)

…

Data Source Query Client

MC-Replica

MOMENT Cache (Remote BDB Service)

SGW

Amazon S3

Sub node query

Request query

Send query result

Save raw data to S3

Push jobs for inserting data with parameter of Amazon S3 path.

Pop jobs of inserting data.

Save data location information, data hash value.

Extensible node

Alloc. Modify Delete Data

Performance

26

27

Query response time & System cost

■ Query response time

[Test Environment] Total record count : 1,000,000,000 Total record size : 235.39 GB Column Count : 15 Node Server Count : 10

Query type Response time

Calc. Count without group-by 11.75sec.

Calc. Count with group-by 70.64 sec.

Calc. Sum. 57.75 sec.

■ System Cost ($ per month, base on AWS)

Instance Instance type Count Cost ($)

CORE (CMM, QM) m3.large On-Demand 1 201.6

MM r3.xlarge Spot 10 2030.4

MC m3.medium On-Demand 2 (include replica.) 201.6

SUM. 2433.6

Roadmap

28

29

Our next plans

• Security • ANSI/ISO SQL standard compliance • Interactive tools • Package for 1-time installation

Thank you

30

Software

Rankwave MOMENT™ (English)