12
Delivering Real-time Applications via SoftLayer Cloud-based VoltDB Table of Contents Abstract ..................................................................................................................................................... 1 Introduction ............................................................................................................................................... 1 The Needs for Real-time Analytics ........................................................................................................... 2 Cloud-based Real-time Analytics.............................................................................................................. 3 The VoltDB Installing Process .................................................................................................................. 4 The Steps for developing and running a VoltDB Application in SoftLayer Cloud ................................... 6 Replicating the Same in a Ubuntu Environment ..................................................................................... 12 Abstract Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is, any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically to a series of IT- enabled deeper analytics processes with the help of end-to-end platforms for extracting actionable insights is bound to attain and retain a greater success in its long and arduous journey. Now with the data getting generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured and consolidated data to information and to knowledge methodically. In this document, we would like to explain the unique capabilities of VoltDB in enabling the much-demanded high-performance and real-time big data analytics. Introduction Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big data, high- velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale: VoltDB provides Database throughput reaching millions of operations per second On demand scaling High availability, fault tolerance and database durability Real-time data analytics

VoltDB on SolftLayer Cloud

Embed Size (px)

Citation preview

Delivering Real-time Applications via SoftLayer Cloud-based VoltDB

Table of Contents

Abstract ..................................................................................................................................................... 1

Introduction ............................................................................................................................................... 1

The Needs for Real-time Analytics ........................................................................................................... 2

Cloud-based Real-time Analytics.............................................................................................................. 3

The VoltDB Installing Process .................................................................................................................. 4

The Steps for developing and running a VoltDB Application in SoftLayer Cloud ................................... 6

Replicating the Same in a Ubuntu Environment ..................................................................................... 12

Abstract

Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is, any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically to a series of IT-enabled deeper analytics processes with the help of end-to-end platforms for extracting actionable insights is bound to attain and retain a greater success in its long and arduous journey. Now with the data getting generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured and consolidated data to information and to knowledge methodically. In this document, we would like to explain the unique capabilities of VoltDB in enabling the much-demanded high-performance and real-time big data analytics.

Introduction

Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big data, high-velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale: VoltDB provides

Database throughput reaching millions of operations per second

On demand scaling

High availability, fault tolerance and database durability

Real-time data analytics

It is bountifully clear that pioneering platforms and infrastructures are indispensable for extrapolating actionable insights from data heaps. The picture below clearly illustrates the needs for different kinds of databases. One among them is the growing insistence for high-velocity database management systems.

The prominent capabilities of typical high-velocity DBMS requirements include the following

Ingest data at very high speeds Scale adeptly to meet growth spikes and demand peaks Support integrated fault-tolerance Support a wide range of real-time analytics Integrate easily with high-volume analytic data sources

The Needs for Real-time Analytics

There are several business cases and domains yearning for having the special capability of real-time analytics. Telecommunications and online media companies, financial services, public utilities and national defense departments are the leading ones aspiring to acquire the real-time analytics competency. However conventional IT systems are found wanting in comprehensively performing real-time analysis and extraction of information to act upon. The speed with which data gets ingested, pre-processed and mined is missing in today’s systems. Therefore there is a surging popularity for solutions such as VoltDB in enterprise and cloud IT environments. VoltDB overcomes this critical velocity hurdle with an in-memory relational database that combines high-velocity data ingestion, massive scalability, real-time analytics and decision-enabling. VoltDB close the “ingestion-to-decision” gap from minutes to milliseconds. Organizations that have implemented real-time database solutions records even greater rewards by also integrating them to a back-end deeper analytics database. Joining these two specialized database engines in a coordinated manner allows organizations to mine historical data for deeper analytical insights and then combine those results with the data ingestion engine for real-time consumption. It’s

a closed-loop process that delivers new value from a previously untapped and underutilized class of data. VoltDB and IBM Netezza solutions closes the real-time and historical long-term loop, connecting the front and back ends of big data. This closed-loop system merges Netezza’s deep analysis of troves of historical data with the in-the-moment decision and analytics of VoltDB technology. Organizations can leverage the built-in IBM Netezza integration found in VoltDB. Applications and dashboards can seamlessly interact with both systems via SQL and combine data to ultimately present a complete picture, both historical and “now,” to users. VoltDB’s IBM Netezza Export client fetches transactional data from VoltDB and writes it, in batches, to the Netezza database. Users automate the export process by identifying the specific VoltDB tables in the schema as sources for export data. At runtime, any data written to the specified tables is automatically sent to the VoltDB export connector, which in turn exchanges the updated information to the Netezza destination. The VoltDB export process transactionally queues export data to the connector automatically. The export client uses a series of poll and acknowledgement requests to transactionally exchange data between VoltDB and Netezza, The export client runs within the VoltDB cluster and is highly available. VoltDB could export data to any other target, say Hadoop system.

Cloud-based Real-time Analytics

The brewing trend is that clouds are being positioned as the core, central and cognitive IT infrastructure for all kinds of IT platforms and middleware solutions. In this section, we would like to elaborate on how VoltDB is being taken to IBM SoftLayer cloud and is being used to come out with a real-time analytics PoC implementation Detailing the Steps of VoltDB Migration to SoftLayer Cloud - We can choose any Linux image such as CentOS, Ubundu, RedHat, etc. in a SoftLayer cloud environment to install and configure the VoltDB system. We have omitted the details of getting an account in SL and provisioning VMs and bare metal servers in SL cloud as these are detailed in our previous assets.

1) You need to copy the LINUX-voltdb-ent-4.7.tar.gz into the SoftLayer VM using WinSCP and the target place is /home directory.

2) Give the root permission to that file. 3) And go to the console of the SL VM and go to the file directory using with “cd /home” as illustrated below.

4) Then unpack the distribution kit as a folder in the home directory of your personal account using the

command “ tar -zxvf LINUX-voltdb-ent-4.7.tar.gz -C $HOME/ “

5) The following shell command will install the VoltDB software in the folder “/opt/voltdb”

“sudo tar -zxvf LINUX-voltdb-ent-4.7.tar.gz -C /opt”

6) Then we need to move the “voltdb-ent-4.7” file into “voltdb” folder by using the command “sudo mv

voltdb-ent-4.7 voltdb”.

7) If there is a need to check whether java gets installed or not, please use the command “java -version” . If you find any java in the console, there is no need to install. If not found, then you can install Java by using command “sudo yum install java-1.7.0-openjdk-devel” [root@skypetervoltdb opt]# sudo yum install java-1.7.0-openjdk-devel

To check the Java version, please use the command “java –version” as depicted below

The VoltDB Installing Process

In this section, we are explaining how to install VoltDB in SoftLayer virtual environment

1) Run the below command “ln -s $HOME/voltdb-ent-4.7 $HOME/voltdb-ent” 2) Need to export the voltdb-ent to /bin directory by using the command “export PATH=$PATH:$HOME/voltdb-

ent/bin “ 3) Then switch to directory to “” 4) Now we need to see the “ls -l”

[root@skypetervoltdb voter]# ls –l total 564 -rw-r--r-- 1 root root 57631 Oct 20 00:55 catalog-report.html -rw-rw-r-- 1 507 1001 1720 Sep 9 09:09 ddl.sql -rw-rw-r-- 1 507 1001 166 Sep 9 09:09 deployment.xml drwxr-xr-x 2 root root 4096 Oct 20 00:55 log drwxr-xr-x 3 root root 4096 Oct 20 00:54 obj -rw-rw-r-- 1 507 1001 3911 Sep 9 09:09 README -rwxrwxr-x 1 507 1001 5918 Sep 9 09:09 run.sh drwxrwxr-x 3 507 1001 4096 Sep 9 09:09 src drwxr-xr-x 2 root root 4096 Oct 20 00:55 statement-plans -rwxrwxrwx 1 root root 421971 Oct 14 03:43 VoltDB-app-nbbo-0bd619d.zip drwxr-xr-x 8 root root 4096 Oct 20 00:55 voltdbroot -rw-r--r-- 1 root root 33389 Oct 20 00:55 voter.jar drwxrwxr-x 4 507 1001 4096 Sep 9 09:09 web

5) Then you can run the VoltDB using the “./run.sh” and you will come across the screens below.

Having the steps enumerated above, we could install VoltDB in SoftLayer VM successfully.

The Steps for developing and running a VoltDB Application in SoftLayer Cloud

NBBO (National Best Bid and Offer) is a sample application we have chosen to demonstrate how VoltDB plays its role here.

NBBO is defined as the lowest available ask price and the highest available bid price across the participating markets for a given security. Brokers should route trade orders to the market with the best price and by law must guarantee customers the best available price. This application includes a VoltDB database schema that stores each market data tick and automatically inserts a new NBBO record whenever there is a change to the best available bid or ask price. This can be used to serve the current NBBO or the history of NBBO changes on demand to consumers such as the dashboard or other applications. The example includes a web dashboard that shows the real-time NBBO for a security and the latest available prices from each exchange.

1) We need to download the NBBO application from this page http://voltdb.github.io/app-nbbo/ 2) Next step is to copy the application into directory “/opt/voltdb/examples” and give the root permission to

that zip file. 3) Unzip the file using the command “unzip VoltDB-app-nbbo-0bd619d.zip “.

4) On entering into the application folder “cd VoltDB-app-nbbo-0bd619d “, we come across the file VoltDB-

app-nbbo-0bd619d as illustrated below.

6) Now we can start the web server inside the SoftLayer VM using the command “./run.sh start_web”.

7) Open Firefox using the address <http://ip:8081> and we can see the application UI as follows.

In the above screen, we are seeing the line “Starting database, this may take 30 seconds”. This is because we have not yet started the application server. To start the above sever, you can use the command “cd /opt/voltdb/examples/voter” and to run the server, use the command “. /run.sh server”. Once the server is started and run as per the steps indicated above, we need to run the command “./run.sh client” also for showing the result in the client application.

Now you can refresh the web browser (in this case, Firefox) to see the latest result.

If you want see the catalog reports, then click on the Catalog reports button on the right side of the screen.

Now we are going to open the VoltDB console for executing the command. Click on the Overview button on tool

bar menu and you can see the below screen.

The next step is to click on the button “VoltDB Studio” and the following page will appear.

Then on clicking the Connect button, it will ask for the details such as the Username and the password of VoltDB server.

Username: -- root Password: --- xxxxx

After all these activities are over, the connectivity with the VoltDB database is established. To check the speed, we gave a command and the result is outputted within a second. It took some seconds without the VoltDB and now with VoltDB in place, the response time is sub-second as illustrated through above and below snapshots. Now

We try with a complex query to see how much time it takes to output the result. Please see the screen images carefully to understand the real differentiator.

This is the complex query and results are flashed back in sub-second time. SELECT a.contestant_name AS contestant_name , a.contestant_number AS contestant_number , SUM(b.num_votes) AS total_votes FROM v_votes_by_contestant_number_state AS b , contestants AS a WHERE a.contestant_number = b.contestant_number GROUP BY a.contestant_name , a.contestant_number ORDER BY total_votes DESC , contestant_number ASC , contestant_name ASC;

Replicating the Same in a Ubuntu Environment

The above experiment was done on a CentOS VM. Now we explain how the same can be accomplished through an Ubuntu OS image.

1) We need to copy the VoltDB software in to server in “cd /opt” directory. 2) Copy the file to your installation directory, I’ve chosen /opt 3) Change directory to /opt: cd /opt 4) Use the command: tar zxvf LINUX-voltdb-3.7.0.4.tar.gz 5) Check whether the prerequisites are already installed 6) Install aptitude using with command “apt-get install aptitude “ 7) Install the build essentials package using with command “aptitude install build-essential” 8) Install a Java jdk using with command “aptitude install openjdk-7-jdk “ 9) Then start the server using with command “./run.sh server”.

10) Then start the client using with command “./run.sh client”.

Conclusion The intention is to experiment and expose the fitment of cloud environments for doing real-time analytics on big data. Sometime back, we did a PoC on leveraging IBM Netezza for real-time analytics of big data through the ingeniously supplied massive-scale parallelization. Now VoltDB is being moved to IBM SoftLayer cloud to investigate how it enables real-time analytics. Further on, VoltDB has a data export connector for establishing a tight relationship between VoltDB and Netezza for bringing in a flexible and futuristic real-time analytics of both fast, big and historical data to emit actionable insights in time. The results as shown in the document evidently tells the distinctions of VoltDB in facilitating next-generation real-time and real-world applications for the smarter planet. Authors Pethuru Raj & Skylab Vanga IBM Global Cloud Center of Excellence (CoE) IBM India, Bangalore, India 560045 E-mail IDs – [email protected] [email protected]