GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
Syed RasheedSolution ManagerRed Hat Corp.
Kenny PeeplesTechnical ManagerRed Hat Corp.
Kimberly PalkoProduct ManagerRed Hat Corp.
AGENDA
Demystifying Big Data
Data Virtualization: Making Big Data Available to Everyone
Red Hat Big Data Strategy and Platform
Real World Customer Example using Red Hat Big Data Platform
Demo
Roadmap
Q&A
IT’S ALL ABOUT GAINING BUSINESS INSIGHTS
Improve product development
Optimize business processes
Improve customer care
Improve customer lifetime value
Personalize products
Competitive intelligence
…
INFORMATION AND AGILITY GAP
Over 70%BI project efforts lies in
Data Integration – finding and identifying source data
Only 28%Users have any meaningful data
access 65% Constantly changing business needs
57% IT’s inability to satisfy new requests in a timely manner
54% The need to be a more analytics-driven organization
47% Slow and untimely access to information
34% Business user dissatisfaction with IT-delivered BI capabilities
RED HAT’S BIG DATA STRATEGY
Reduce Information Gap thru cost effectively making ALLdata easily consumable for analytics
Capture Process Integrate
Data
An
aly
tics
Data to Actionable Information Cycle
EASY ACCESS TO BIG DATABI Reports & Analytics
Hive
MapReduce
HDFS
Analytical Reporting Tool
Data Virtualization Server
Hadoop
Big Data
1. Reporting tool accesses the data virtualization server via rich SQL dialect
2. The data virtualization server translates rich SQL dialect to HiveQL
3. Hive translates SQL to MapReduce
4. MapReduce runs MR job on big data
TURN FRAGMENTED DATA INTO ACTIONABLE INFORMATION
Connect
Compose
Consume
BI Reports & AnalyticsMobile Applications
SOA Applications & Portals ESB, ETL
Native Data Connectivity
Standard based Data ProvisioningJDBC, ODBC, REST, SOAP, OData
JBo
ssD
ata
Vir
tual
izat
ion
Dat
a C
on
sum
ers
Dat
a So
urc
es
Design Tools
Dashboard
Optimization
Caching
Security
Metadata
Hadoop NoSQL Cloud Apps Data Warehouse & Databases
MainframeXML, CSV
& Excel FilesEnterprise Apps
Siloed &Complex
VirtualizeTransformFederate
Easy,Real-time
InformationAccess
Unified Virtual Database / Common Data ModelData Transformations
BENEFITS OF DATA VIRTUALIZATION ON BIG DATA
Enterprise democratization of big data
Any reporting or analytical tool can be used
Easy access to big data
Seamless integration of big data and existing data assets
Sharing of integration specifications
Collaborative development on big data
Fine-grained of security big data
Increased time-to-market of reports on big data
CONVERGENCE OF FOUR DATA TRENDS
Big Structured Data
•Transactional & Analytical
Big Streaming Data
•Events & Messages
Big Data Processing
•Hadoop
Big Unstructured Data
•Social & Interactions
Big Data Integration
COMPREHENSIVE MIDDLEWARE PLATFORMCAPTURE, PROCESS AND INTEGRATE BIG DATA VOLUME, VELOCITY, VARIETY
Hadoop
Data IntegrationJBoss Data Virtualization
In-memory CacheJBoss Data Grid
BI Analytics (historical, operational, predictive)
SOA Composite Applications
Messaging and Event Processing JBoss A-MQ and JBoss BRMS
J
Structured Data Streaming Data Semi-Structured Data
Red
Hat Sto
rageR
ed H
at Enterp
rise Linu
x & V
irtualizatio
n
Cap
ture
& P
roce
ssIn
tegr
ate
& A
nal
yze
RED HAT BIG DATA PLATFORM
•JBoss Data Virtualization
•JBoss BRMS
•JBoss A-MQ
•JBoss Data Grid
Integration Software
•Red Hat Storage
•Red Hat Enterprise Virtualization
•Red Hat Enterprise Linux
Infrastructure Software
BIG DATA IN THE UTILITIES
Objective:
Combine data from smart meters on homes with data from electricity generation and transmission and make it available to power providers
Problem:
The original smart grid project looked only at reading information from the meters on houses and now this data needs to be combined with generation and transmission data in a cost-effective way
The data points are all over the place: sensors on the lines, in the field, homes, etc.
The information must be accessible to multiple power providers through a common interface
Solution:
Use Messaging to collect data from a variety of sources and route it to a CEP for initial filtering. Process with Hadoop map/reduce and BRMS and distribute data to Data Virtualization to be combined with other sources and consumed with BI tools, and/or to JDG for in-memory data caching and/or send to archive.
SMART GRID
Transmission Generation Consumer
Regulatory Users
Collector Sensors Local
Data Store
Collector Scada Local
Data Store
Collector Meter Local
Data Store
Adaptor Rules
Sensor Adaptor
Routing Function
Normalization / MapReduce
PM Regional Translator / Scheduler
Offline Storage
Data Virtualization
Cache
Authentication Presentation REST Exposure
Element ConnectionTier
Data Adaptation & Routing Tier
Normalized DataTier
DataTier
API Exposure&Portal Tier
Compose
PM Data SchedulePM Data Reports
Rules Creation/ Updates
PM Admin
NoSQL-Cassandra
RETAIL CUSTOMER USE CASEGAIN BETTER INSIGHT FOR INTELLIGENT INVENTORY MANAGEMENT
Objective:
Right merchandise, at right time and price
Problem:
Cannot utilize social data and sentiment analysis with their inventory and purchase management system
Solution:
Leverage JBoss Data Virtualization to mashupSentiment analysis data with inventory and purchasing system data. Leveraged BRMS to optimize pricing and stocking decisions.
ConsumeComposeConnect
Analytical Apps
JBoss Data Virtualization
Hive
Inventory Databases
Purchase Mgmt Application
SentimentAnalysis
JBossBRMS
Data Driven Decision
Management
ABOUT LUCIDWORKS
Employs 40% of the “committers” for Lucene/Solr
Makes 50% - 70% of the enhancements to each release of Lucene/Solr
Only company to offer Open Source and Open Core Search Solutions
LUCIDWORKS DEMONSTRATION
• LucidWorks/Solr to provide full text search and statistics
• Data Virtualization provides the data through Teiid JDBC driver and pulls the data from Hive/Hadoop, CSV File, XML File
• Red Hat Storage provides the Enterprise Data Repository
ABOUT HORTONWORKS
Founded in 2011 by 24 engineers from the original Yahoo! Hadoop development and operations team
Hortonworks drive innovation in the open exclusively via the Apache Software Foundation process
Hortonworks is responsible for around 50% of core code base advances to Apache Hadoop
HORTONWORKS DATA PLATFORM 2 SANDBOX
Enterprise Ready YARN, the Hadoop Operating System
Stinger Phase 2; Interactive SQL Queries at Petabyte Scale
Reliable NoSQL IN Hadoop with Hbase
Technical Specs Component Version
Apache Hadoop 2.2.0
Apache Hive 0.12.0
Apache HCatalog 0.12.0
Apache HBase 0.96.0
Apache ZooKeeper 3.4.5
Apache Pig 0.12.0
Apache Sqoop 1.4.4
Apache Flume 1.4.0
Apache Oozie 4.0.0
Apache Ambari 1.4.1
Apache Mahout 0.8.0
Hue 2.3.0
HORTONWORKS DEMONSTRATION
Objective:
Secure data according to Role for row level security and Column Masking
Problem:
Cannot hide region data such as patient data from region specific users
Solution:
Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns
ConsumeComposeConnect
DV Dashboard to analyze the aggregated data by User Role
JBoss Data Virtualization
Hive
SOURCE 1: Hive/Hadoop in the HDP contains US Region Data
SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data
Hive
HORTONWORKS DEMONSTRATION
Objective:
Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales
Problem:
Cannot utilize social data and sentiment analysis with sales management system
Solution:
Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data.
ConsumeComposeConnect
Excel Powerview and DV Dashboard to analyze the aggregated data
JBoss Data Virtualization
Hive
SOURCE 1: Hive/Hadoop contains twitter data including sentiment
SOURCE 2: MySQL data that includes ticket and merchandise sales
DEMONSTRATION SYSTEM REQUIREMENTS
• JDK– Oracle JDK 1.6, 1.7 or OpenJDK 1.6 or 1.7
• JBoss Data Virtualization v6 Beta– http://jboss.org/products/datavirt.html
• JBoss Developer Studio– http://jboss.org/products
• JBoss Integration Stack Tools (Teiid)– https://devstudio.jboss.com/updates/7.0-development/integration-stack/
• Slides, Code and References for demo– https://github.com/DataVirtualizationByExample/Mashup-with-Hive-and-
MySQL
• Hortonworks Data Platform (A VM for testing Hive/Hadoop)– http://hortonworks.com/products/hdp-2/#install
• Red Hat Storage– http://www.redhat.com/products/storage-server/
WHAT COMING: JBOSS DATA VIRTUALIZATION 6.1
Big Data
•Full connectivity support for:
•MongoDB
•Cloudera Impala
•Apache Solr
•Tech Preview
•Cassandra
•Accumulo
Cloud
•Alpha availability on OpenShift
•Support for:
•Amazon RedShift
•Amazon SimpleDB
Deployment Productivity
•Security audit log in Dashboard builder
• Improved usability for custom translator
•EAP 6.3 support
•RHEL 7 support
•MariaDB
•Azul JVM support
BENEFITS OF DATA VIRTUALIZATION ON BIG DATA
Enterprise democratization of big data
Any reporting or analytical tool can be used
Easy access to big data
Seamless integration of big data and existing data assets
Sharing of integration specifications
Collaborative development on big data
Fine-grained of security big data
Increased time-to-market of reports on big data
WHY RED HAT FOR BIG DATA?
Transform ALL data into actionable information
Cost Effective, Comprehensive Platform
Community based Innovation
Enterprise Class Software and Support
Capture Process Integrate
Data
An
aly
tics
Data to Actionable Information Cycle