Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
© OpenBI, LLC 2010 1�
Pentaho Big Data Overview & Demo��Presented to Chicago Big Data Meetup��April 19, 2012��
© OpenBI, LLC 2010
Quick Quiz �
2�
When it comes to big data programming which person are you? �
A � B� C �
© OpenBI, LLC 2010 3�
About OpenBI �
• OpenBI is a premier professional services firm that provides affordable, best-in-class Business Intelligence, Analytics and Big Data solutions �– Performance Management-driven, directing BI
investments to evidence-based business results �– Cost-Effective, leveraging open-source/low-cost
technologies with high-value, quick initial-result engagements �
– Reliable, created by industry experts and using repeatable best practices and templates �
© OpenBI, LLC 2010
About Pentaho �
• Leader in business analytics & data integration�• Subscription-based business model�• Achieved critical mass: �
– Over 1,200 commercial customers �– Over 10,000 production deployments �– Over 185 countries �
• Stewardship of most important open source analytics projects �
4�
INDUSTRY RECOGNITION OVER 160 PARTNERS GLOBALLY
© OpenBI, LLC 2010
Comprehensive BI Platform�
CENTRAL ADMINISTRATION, AUDITING & MONITORING
DELIVER When & Where Users Need It
STREAMLINE Information Delivery
VISUALIZE & Report Information In Any Style
ACCESS All Enterprise Data Sources
ISV & Packaged Applications
SaaS / Cloud Applications
EMBEDDED
Web
Mobile
STANDALONE
‣ Advanced & Predictive Analytics
DATA MINING
‣ Interactive ‣ Operational
‣ Enterprise
REPORTING
‣ Ad hoc Exploration ‣ Multi-Dimensional
ANALYSIS
‣ Interactive Metrics ‣ Rich Visualizations
DASHBOARDS
ERP / CRM / Enterprise Apps (e.g. SAP, Oracle)
Hadoop & NoSQL Data
Unstructured & semi-structured (Logs, social, machine���generated)
Relational Data Sources
Cloud (e.g. Salesforce, Amazon, Dell)
‣ Direct Access
‣ Data Integration
‣ Hadoop Clustering
‣ Graphical ETL Designer
‣ Enterprise Scalability
INTEGRATE, CLEANSE, & ENRICH DATA
‣ In Memory Caching
‣ High Performance
‣ Relational OLAP Cubes
METADATA LAYER
5�
© OpenBI, LLC 2010
Explosion of Big Data Solutions �
6�
• Analytic Databases �– e.g. Vertica, Vectorwise, Teradata/
Aster Data, Netezza, Infobright, etc.�• Columnar, MPP, in-memory, DW
appliances, OLAP databases �
• Hadoop�– Apache, Cloudera, EMC Greenplum HD,
Hadapt, HortonWorks, MapR �
• NoSQL �– e.g. HBase, MongoDB, Cassandra, etc.�
© OpenBI, LLC 2010
• Disconnected – Not easily connected to data sources
• Extremely technical – Requires highly technical resources – Barrier to entry is high – Long development and deployment cycles
• No “whole product” – Lacks consistent management, systems tools – Lacks data integration & orchestration tools
• Performance on moving data is imperative
• Not optimized for BI – Not a database – High latency – Limited SQL access
8�
Hadoop and NoSQL Challenges �
© OpenBI, LLC 2010 10�
Big
Data
Mgm
t��
Data Integration�Job Orchestration�Workflow �
Scheduling �High Performance �Visual IDE �
Data
Inte
grat
ion �
�An
alyt
ics�
�
Pentaho in the Big Data Ecosystem�
Pentaho Business Analytics �
• R • 3rd Party BI Tools • Applications
3rd Party Tools �
Kettle (Pentaho Data Integration) �
Analytic Databases �NoSQL Databases �Hadoop�Java MapReduce, Pig, Pentaho MapReduce �
© OpenBI, LLC 2010
Pentaho as a Hadoop Client �
• HDFS Integration�– Read/Write Files �– Copy/Move/Delete �– Folder Management �
• Java MR Execution�• Pig Script Execution�• Hive �
– Queries �– Scripts �
11�
Hadoop
PDI
© OpenBI, LLC 2010
Pentaho MapReduce �
12�
Pentaho Kettle engine executing in cluster �
• Pentaho MapReduce �– PDI Transforms as Map/Combine/
Reduce programs �• No Node-specific installation�
– Distributed Cache used to distribute Kettle jar files�
• Visual MR programming and debugging IDE �
Hadoop
PDI
© OpenBI, LLC 2010 13�
Graphical Interfaces to NoSQL Databases �
• Read/Write to �– HBase �– Cassandra�– MongoDB�– Others to come… �
• NoSQL to RDBMS�• NoSQL to Report � PDI
NoSQL �
Reports �
© OpenBI, LLC 2010 14�
• Integrates classic ETL & Big Data processes.�– Scheduling �– Events �– Dependencies�– Error Handling �
Graphical Job Orchestration�
© OpenBI, LLC 2010
Pentaho Big Data Value Proposition�
15�
Integrated
Accessible
Productive
• One Data Programming Platform�– Big Data & Traditional ETL�
• Connect Anything to Anything �– No more islands�
• MapReduce for Data Programmers �– No/Minimal Java Required�
• Increase Utilization of Your Hadoop Investment �
• Graphical, Integrated IDE �– Rich MR Programming Semantic �– Pre-built connectors to NoSQL & Analytic
DBs and HDFS�– Same technology used for traditional ETL
programs �