Download pdf - GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS · PDF fileAnalytical Apps JBoss Data Virtualization Hive Inventory Databases Purchase Mgmt ... Interactive SQL Queries at Petabyte Scale

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Syed RasheedSolution ManagerRed Hat Corp.

Kenny PeeplesTechnical ManagerRed Hat Corp.

Kimberly PalkoProduct ManagerRed Hat Corp.

AGENDA

Demystifying Big Data

Data Virtualization: Making Big Data Available to Everyone

Red Hat Big Data Strategy and Platform

Real World Customer Example using Red Hat Big Data Platform

Demo

Roadmap

Q&A

DO WE AGREE ON WHAT BIG DATA IS?

Source: http://blogs.ifsworld.com/2013/02/how-will-big-data-influence-your-finance-team/

IT’S ALL ABOUT GAINING BUSINESS INSIGHTS

Improve product development

Optimize business processes

Improve customer care

Improve customer lifetime value

Personalize products

Competitive intelligence

…

INFORMATION AND AGILITY GAP

Over 70%BI project efforts lies in

Data Integration – finding and identifying source data

Only 28%Users have any meaningful data

access 65% Constantly changing business needs

57% IT’s inability to satisfy new requests in a timely manner

54% The need to be a more analytics-driven organization

47% Slow and untimely access to information

34% Business user dissatisfaction with IT-delivered BI capabilities

DATA CHALLENGES GETTING BIGGER FOR USERS

NoSQL

HiveMapReduce

HDFSPig

JaqlFlume

StormHBase

RED HAT’S BIG DATA STRATEGY

Reduce Information Gap thru cost effectively making ALLdata easily consumable for analytics

Capture Process Integrate

Data

An

aly

tics

Data to Actionable Information Cycle

BIG DATA FOR EVERYONE

EASY ACCESS TO BIG DATABI Reports & Analytics

Hive

MapReduce

HDFS

Analytical Reporting Tool

Data Virtualization Server

Hadoop

Big Data

1. Reporting tool accesses the data virtualization server via rich SQL dialect

2. The data virtualization server translates rich SQL dialect to HiveQL

3. Hive translates SQL to MapReduce

4. MapReduce runs MR job on big data

TURN FRAGMENTED DATA INTO ACTIONABLE INFORMATION

Connect

Compose

Consume

BI Reports & AnalyticsMobile Applications

SOA Applications & Portals ESB, ETL

Native Data Connectivity

Standard based Data ProvisioningJDBC, ODBC, REST, SOAP, OData

JBo

ssD

ata

Vir

tual

izat

ion

Dat

a C

on

sum

ers

Dat

a So

urc

es

Design Tools

Dashboard

Optimization

Caching

Security

Metadata

Hadoop NoSQL Cloud Apps Data Warehouse & Databases

MainframeXML, CSV

& Excel FilesEnterprise Apps

Siloed &Complex

VirtualizeTransformFederate

Easy,Real-time

InformationAccess

Unified Virtual Database / Common Data ModelData Transformations

BENEFITS OF DATA VIRTUALIZATION ON BIG DATA

Enterprise democratization of big data

Any reporting or analytical tool can be used

Easy access to big data

Seamless integration of big data and existing data assets

Sharing of integration specifications

Collaborative development on big data

Fine-grained of security big data

Increased time-to-market of reports on big data

CONVERGENCE OF FOUR DATA TRENDS

Big Structured Data

•Transactional & Analytical

Big Streaming Data

•Events & Messages

Big Data Processing

•Hadoop

Big Unstructured Data

•Social & Interactions

Big Data Integration

COMPREHENSIVE MIDDLEWARE PLATFORMCAPTURE, PROCESS AND INTEGRATE BIG DATA VOLUME, VELOCITY, VARIETY

Hadoop

Data IntegrationJBoss Data Virtualization

In-memory CacheJBoss Data Grid

BI Analytics (historical, operational, predictive)

SOA Composite Applications

Messaging and Event Processing JBoss A-MQ and JBoss BRMS

J

Structured Data Streaming Data Semi-Structured Data

Red

Hat Sto

rageR

ed H

at Enterp

rise Linu

x & V

irtualizatio

n

Cap

ture

& P

roce

ssIn

tegr

ate

& A

nal

yze

RED HAT BIG DATA PLATFORM

•JBoss Data Virtualization

•JBoss BRMS

•JBoss A-MQ

•JBoss Data Grid

Integration Software

•Red Hat Storage

•Red Hat Enterprise Virtualization

•Red Hat Enterprise Linux

Infrastructure Software

EXAMPLES:RED HAT BIG DATA PLATFORM IN THE REAL WORLD

BIG DATA IN THE UTILITIES

Objective:

Combine data from smart meters on homes with data from electricity generation and transmission and make it available to power providers

Problem:

The original smart grid project looked only at reading information from the meters on houses and now this data needs to be combined with generation and transmission data in a cost-effective way

The data points are all over the place: sensors on the lines, in the field, homes, etc.

The information must be accessible to multiple power providers through a common interface

Solution:

Use Messaging to collect data from a variety of sources and route it to a CEP for initial filtering. Process with Hadoop map/reduce and BRMS and distribute data to Data Virtualization to be combined with other sources and consumed with BI tools, and/or to JDG for in-memory data caching and/or send to archive.

SMART GRID

Transmission Generation Consumer

Regulatory Users

Collector Sensors Local

Data Store

Collector Scada Local

Data Store

Collector Meter Local

Data Store

Adaptor Rules

Sensor Adaptor

Routing Function

Normalization / MapReduce

PM Regional Translator / Scheduler

Offline Storage

Data Virtualization

Cache

Authentication Presentation REST Exposure

Element ConnectionTier

Data Adaptation & Routing Tier

Normalized DataTier

DataTier

API Exposure&Portal Tier

Compose

PM Data SchedulePM Data Reports

Rules Creation/ Updates

PM Admin

NoSQL-Cassandra

RETAIL CUSTOMER USE CASEGAIN BETTER INSIGHT FOR INTELLIGENT INVENTORY MANAGEMENT

Objective:

Right merchandise, at right time and price

Problem:

Cannot utilize social data and sentiment analysis with their inventory and purchase management system

Solution:

Leverage JBoss Data Virtualization to mashupSentiment analysis data with inventory and purchasing system data. Leveraged BRMS to optimize pricing and stocking decisions.

ConsumeComposeConnect

Analytical Apps

JBoss Data Virtualization

Hive

Inventory Databases

Purchase Mgmt Application

SentimentAnalysis

JBossBRMS

Data Driven Decision

Management

DEMOSLUCIDWORKS, JBOSS DATA VIRTUALIZATION AND RED HAT STORAGE

ABOUT LUCIDWORKS

Employs 40% of the “committers” for Lucene/Solr

Makes 50% - 70% of the enhancements to each release of Lucene/Solr

Only company to offer Open Source and Open Core Search Solutions

LUCENE/SOLR: ENABLING BETTER, DATA-DRIVEN DECISIONS

LUCIDWORKS DEMONSTRATION

• LucidWorks/Solr to provide full text search and statistics

• Data Virtualization provides the data through Teiid JDBC driver and pulls the data from Hive/Hadoop, CSV File, XML File

• Red Hat Storage provides the Enterprise Data Repository

DEMONSTRATION ARCHITECTURE

DEMOSHORTONWORKS AND JBOSS DATA VIRTUALIZATION

ABOUT HORTONWORKS

Founded in 2011 by 24 engineers from the original Yahoo! Hadoop development and operations team

Hortonworks drive innovation in the open exclusively via the Apache Software Foundation process

Hortonworks is responsible for around 50% of core code base advances to Apache Hadoop

HORTONWORKS DATA PLATFORM 2 SANDBOX

Enterprise Ready YARN, the Hadoop Operating System

Stinger Phase 2; Interactive SQL Queries at Petabyte Scale

Reliable NoSQL IN Hadoop with Hbase

Technical Specs Component Version

Apache Hadoop 2.2.0

Apache Hive 0.12.0

Apache HCatalog 0.12.0

Apache HBase 0.96.0

Apache ZooKeeper 3.4.5

Apache Pig 0.12.0

Apache Sqoop 1.4.4

Apache Flume 1.4.0

Apache Oozie 4.0.0

Apache Ambari 1.4.1

Apache Mahout 0.8.0

Hue 2.3.0

http://hortonworks.com/hadoop/

http://hortonworks.com/hadoop/hive

http://hortonworks.com/hadoop/hcatalog

http://hortonworks.com/hadoop/hbase

http://hortonworks.com/hadoop/zookeeper

http://hortonworks.com/hadoop/pig

http://hortonworks.com/hadoop/sqoop

http://hortonworks.com/hadoop/flume

http://hortonworks.com/hadoop/oozie

http://hortonworks.com/hadoop/ambari

http://hortonworks.com/hadoop/mahout

HORTONWORKS DEMONSTRATION

Objective:

Secure data according to Role for row level security and Column Masking

Problem:

Cannot hide region data such as patient data from region specific users

Solution:

Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns


DV Dashboard to analyze the aggregated data by User Role


Hive

SOURCE 1: Hive/Hadoop in the HDP contains US Region Data

SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data

Hive

HORTONWORKS DEMONSTRATION

Objective:

Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales

Problem:

Cannot utilize social data and sentiment analysis with sales management system

Solution:

Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data.


Excel Powerview and DV Dashboard to analyze the aggregated data


Hive

SOURCE 1: Hive/Hadoop contains twitter data including sentiment

SOURCE 2: MySQL data that includes ticket and merchandise sales

DEMONSTRATION SYSTEM REQUIREMENTS

• JDK– Oracle JDK 1.6, 1.7 or OpenJDK 1.6 or 1.7

• JBoss Data Virtualization v6 Beta– http://jboss.org/products/datavirt.html

• JBoss Developer Studio– http://jboss.org/products

• JBoss Integration Stack Tools (Teiid)– https://devstudio.jboss.com/updates/7.0-development/integration-stack/

• Slides, Code and References for demo– https://github.com/DataVirtualizationByExample/Mashup-with-Hive-and-

MySQL

• Hortonworks Data Platform (A VM for testing Hive/Hadoop)– http://hortonworks.com/products/hdp-2/#install

• Red Hat Storage– http://www.redhat.com/products/storage-server/

http://jboss.org/products/datavirt.html

http://jboss.org/products

https://devstudio.jboss.com/updates/7.0-development/integration-stack/

https://github.com/DataVirtualizationByExample/Mashup-with-Hive-and-MySQL

http://hortonworks.com/products/hdp-2/#install

http://www.redhat.com/products/storage-server/

JBOSS DATA VIRTUALIZATION PRODUCT ROADMAP AND BIG DATA

WHAT COMING: JBOSS DATA VIRTUALIZATION 6.1

Big Data

•Full connectivity support for:

•MongoDB

•Cloudera Impala

•Apache Solr

•Tech Preview

•Cassandra

•Accumulo

Cloud

•Alpha availability on OpenShift

•Support for:

•Amazon RedShift

•Amazon SimpleDB

Deployment Productivity

•Security audit log in Dashboard builder

• Improved usability for custom translator

•EAP 6.3 support

•RHEL 7 support

•MariaDB

•Azul JVM support

BENEFITS OF DATA VIRTUALIZATION ON BIG DATA

Enterprise democratization of big data

Any reporting or analytical tool can be used

Easy access to big data

Seamless integration of big data and existing data assets

Sharing of integration specifications

Collaborative development on big data

Fine-grained of security big data

Increased time-to-market of reports on big data

WHY RED HAT FOR BIG DATA?

Transform ALL data into actionable information

Cost Effective, Comprehensive Platform

Community based Innovation

Enterprise Class Software and Support

Capture Process Integrate

Data

An

aly

tics

Data to Actionable Information Cycle

THANK YOUQ & A