Upload
emc-academic-alliance
View
108
Download
4
Embed Size (px)
DESCRIPTION
Citation preview
1 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
The New Pivotal Big Data Suite Jacque Istok
2 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Store Everything Analyze Anything Build the Right Thing
3 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Enables Hadoop Market Adoption
Data Lakes Unify Unstructured and Structured Data Access
Big Data Apps Build analytic and
transaction-led applications impacting
top line revenue
Data-Driven Enterprise
App Dev and Operational Management on HDFS
Data Architecture
ETL Offload Accommodate massive
data growth with existing EDW investments
4 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Full Approach It’s More Than Just Hadoop
5 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Big Data: Industry Perspective Retail • CRM – Customer Scoring • Store Siting and Layout • Fraud Detection / Prevention • Supply Chain Optimization
Advertising & Public Relations • Demand Signaling • Ad Targeting • Sentiment Analysis • Customer Acquisition
Financial Services • Algorithmic Trading • Risk Analysis • Fraud Detection • Portfolio Analysis
Media & Telecommunications • Network Optimization • Customer Scoring • Churn Prevention • Fraud Prevention
Manufacturing • Product Research • Engineering Analytics • Process and Quality Analysis • Distribution Optimization
Energy • Smart Grid • Exploration
Government • Market Governance • Counter-Terrorism • Econometrics • Health Informatics
Healthcare & Life Sciences • Pharmaco-Genomics • Bio-Informatics • Pharmaceutical Research • Clinical Outcomes Research
6 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
How Pivotal Accelerates Value Creation
70% of data generated by
customers
80% of data being stored
3% being prepared for
analysis
0.5% being analyzed
<0.5% being operationalized
First Movers
Smart Enterprises
~20X $2.9B
~30X$4B
~7X $290B
~20X $120B
Average Enterprises
SOLVE THE BIG DATA UTILITY GAP
7 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Market Dynamics: Big Data Technologies
Applications
Analytics and Discovery
Data Organization and Management
Infrastructure
A new generation of technologies and architectures that enable economical high-velocity capture, discovery and analysis
Pivotal Data Labs
Source: IDC Predictions 2013: Big Data Battle for Dominance in the Intelligent Economy
8 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Journey to Data Driven Enterprise
Archive
•Realize cost efficiencies and extend life of existing systems and
•Data migration
Insights
•Integrate all existing data to generate business insights
•Data Analysis
Apps
•Build Apps to assist/take (automated) actions from the insights generated
•Data Driven Apps
Business Models
•Create new revenue streams leveraging new data and new insights
•Business Transformation
Repeatable Framework
• Platform for experimenting data driven business models and innovation
•Experimentation Platform
Data Lake Platform as a Service
Manager IT Leaders Business Leader CEO
STE
PS
TE
CH
NO
LOG
Y
TAR
GET
9 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Driven: Harder Than it Sounds
Operationalize
Ingest
Distill
Interface
Process
Analytical Transactional
Operationalize
Ingest
Distill
Interface
Process
Analytical Transactional
Operationalize
Ingest
Distill
Interface
Process
Analytical Transactional
Real Time Near Real Time Batch
Predictive Call Routing, Fraud Prediction, Dynamic Pricing,
Re-Marketing, Stream Analytics
Analytic Model Designs, Transaction Analysis, Trend Analysis
ETL, Archive, Trending, Monthly and Weekly Jobs
10 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Driven: Impossible in Silos
Finance Manufacturing Marketing IT
Data Growth Over 60% Floods These Silos
11 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Generic Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier System monitoring System management
Unified Data Management Tier Data mgmt.
services MDM RDM
Audit and policy mgmt.
Processing Tier
Workflow management
Distillation Tier
HDFS storage Unstructured and structured data
In-memory MPP database
Real-time
Micro batch
Mega batch
SQL NoSQL
SQL MapReduce
Query interfaces
SQL
Sources Action Tier
Real-time ingestion
Micro batch ingestion
Batch ingestion
Real-time insights
Interactive insights
Batch insights
12 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake
Govern where it matters
Focus on MDM and RDM Enforce only when sharing Treat corporate as aggregation of local
Encourage local requirements
Let the business decide what they need Build from the bottom Enable traceability to source Disposable data views
Distill on demand Select only what you want Business friendly tooling Re-usable information maps Rapid change cycle
Store everything Store everything ‘as is’ Include structured and unstructured data Store it cheaply
13 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake Architecture
Ingestion Tier
Insights Tier
Unified Operations Tier Pivotal Command Center
Unified Data Management Tier Pivotal Data
Dispatch MDM RDM
Pivotal Data Dispatch
Processing Tier
Spring XD, Oozie
Distillation Tier
Pivotal HD Unstructured and structured data
Pivotal GemFire XD GPDB / HAWQ
Pivotal GemFire XD Spring XD
Spring XD Pivotal
GemFire XD Data Loader
Sqoop Flume
Spring XD Data Loader
Pivotal GemFire XD
HAWQ HBase
HAWQ MapReduce
Hive Pig
Query interfaces
HAWQ Pivotal
GemFire XD HBase
Sources Action Tier
Clickstream Sensor Data
Weblogs Network Data
CRM Data ERP Data
Pivotal GemFire GPDB/HAWQ
Pivotal RabbitMQ Redis
Pivotal CF
14 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake
Govern where it matters
Information governance MDM & RDM data integrated Information RADAR approach to identification
Encourage local requirements
HAWQ – Traditional disk-based structured SQL Pivotal GemFire XD – Fast in-memory database Pivotal GemFire XD – Real-time analytics and integration
Distill on demand HAWQ Structured SQL on Pivotal HD Pivotal Data Dispatch Data movement and transformation
Store everything Pivotal HD Low cost Simplified deployment
15 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
How is a Business Data Lake Different?
Business Data Lake Criteria EDW
Common data model
Base class = standard data Derived classes = local data
Single class = single view across the enterprise
Data quality Full spectrum 1 0
0 1 0 1 0 0 1
0 1 1 1 0
Data integration
Multiple interfaces SQL, SAS, R, MapReduce, NoSQL SQL access integration with SAS, R and other analytical interfaces
Mixed workload with varying QoS
Support low latency, interactive and batch Limited QoS separation required
16 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
17 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
18 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
19 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
20 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
21 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
22 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
23 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Components of a Business Data Lake
• Action – Redis / Pivotal RabbitMQ – Pivotal GemFire – Pivotal CF
• Unified Data Management – Pivotal Data Dispatch
• Unified Operations – Pivotal Command Center
• Storage – Structured – Unstructured
• Ingestion – Pivotal GemFire XD – Spring XD – Pivotal HD
• Distillation – Pivotal Data Dispatch – ETL
• Processing – Pivotal HD – HAWQ – Pivotal GemFire XD
29 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
30 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
31 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
32 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
33 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
34 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
35 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Business Data Lake Terminology
• Streaming • Micro Batch • Batch • Mega Batch • Real Time Response • Interactive Response • Near Real-time Response
36 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD Architecture
HDFS
HBase Pig, Hive,
Mahout Map
Reduce
Sqoop Flume
Resource
Management & Workflow
YARN
ZooKeeper
Apache Pivotal
Command Center Configure,
Deploy, Monitor, Manage
Spring XD
Pivotal HD Enterprise
Spring
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced Database Services
Distributed In-memory
Store
Query Transactions
Ingestion Processing
Hadoop Driver – Parallel with Compaction
ANSI SQL + In-Memory
Pivotal GemFire XD – Real-Time Database Services
MADlib Algorithms
Oozie
Virtual Extensions
GraphLab, Open MPI
37 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal HD Value
• Cost-based Query Optimizer • ANSI SQL Compliant • Linear, incremental scalability on
COTS hardware • Deep Analytic OLAP Queries • Petabyte Data Storage &
Management • Low latency updates and
transactions • Partitioned Events in situ w/ data • Active-active deployment across
WAN
OLAP OLTP
SQL
HDFS
38 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Lake Interfaces
Ingestion Streaming Micro batch Batch Mega batch Data Loader Yes Yes Yes GemFire XD Yes PDD Spring XD Yes Yes Yes Yes Sqoop Yes Yes Distcp Yes Yes Flume Yes Yes Yes HDFS put Yes Yes Talend Yes Yes Informatica Yes Yes
Interface Real time Interactive Batch GemFire XD (SQL) Yes Yes
HAWQ (SQL) Yes Yes Yes
Hive (HiveQL) Yes
HBase (NoSQL) Yes Yes
MapReduce Yes
Pig Yes
Impala (SQL) Yes Yes
BI Tools GemFire XD HAWQ Hive MicroStrategy Yes Yes
BusinessObjects Yes Yes
Spotfire Yes Yes
Tableau Yes Yes
Microsoft Excel Yes Yes
Datameer Yes Yes
Karmasphere Yes Yes
Pivotal Data Dispatch
Legend: Pivotal Apache Partner
Competition
Monitoring Data Management
Configuration Install
Pivotal command
center
Pivotal command
center
Data access Ingestion Analytics +
Analytics
39 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Ingestion
Event processing
Even
t col
lect
ion
Files
Events
Even
ts
File s
Streaming
Mega batch
Pivotal GemFire XD
Spring XD
Micro batch
N/A
Data Loader
Spring XD
Hig
h th
roug
hput
Lo
w
thro
ughp
ut
Batch
Real time
Pivotal GemFire XD
Data Loader
Spring XD
Out of the box support for HTTP, Tail, Mail, Twitter, Pivotal GemFire, TCP, JMS, Pivotal RabbitMQ, Time, MQTT, …
Move massive amounts of data at wire speed with throttling capabilities.
SQL Insert data into a Pivotal GemFire XD and API to send data to Pivotal GemFire XD. Pivotal GemFire XD
Spring XD
Data Loader
40 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Access
SQL Query for interactive data access. Connectivity with industry standard BI tools.
HiveQL and MapReduce for batch data access. HBase for real-time looking and simple data queries.
SQL queries, NoSQL and alerting APIs for real-time data. Data persisted on HDFS immediately available for interactive queries.
Pivotal GemFire XD
HAWQ
Hive HBase MapReduce
Anal
ytic
s Lo
oku
p
Batch
Real time Interactive
Que
ry
HAWQ
Hive MapReduce
Pivotal GemFire XD
HBase MapReduce Pig
Data distillation MapReduce
Pig
Use connectors, programs, models to convert to
structured data
Event access methods
Even
t sto
rage
Unstructured Structured interfaces
Uns
truc
ture
d St
ruct
ured
SQL HiveQL
Hbase APIs
MapReduce Pig
41 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Data Distillation
SQL Query for interactive data access. Connectivity with industry standard BI tools.
HiveQL and MapReduce for batch data access. HBase for real-time looking and simple data queries.
SQL queries, NoSQL and alerting APIs for real-time data. Data persisted on HDFS immediately available for interactive queries.
Pivotal GemFire XD
HAWQ
Hive HBase MapReduce
Anal
ytic
s Lo
oku
p
Batch
Real time Interactive
Que
ry
HAWQ
Hive MapReduce
Pivotal GemFire XD
HBase MapReduce Pig
Connectors from Hadoop
Pivotal Greenplum Database
Pivotal GemFire/SQL Fire
Processing platform
Dat
a st
orag
e
Native Hadoop
Nat
ive
HD
FS
HAWQ Pivotal GemFire XD
PXF connectors
42 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
The Scenario Yesterday…
Application Type
Database
Hadoop Distributed File System
Parallel Query Engine
In-Memory Data Grid for Hadoop
In-Memory Data Grid with SQL Layer
In-Memory Data Grid
Pricing Metric Pivotal Component
Data storage: tiered terabytes
Nodes
Nodes
TBD
CPUs and Add Ons with restrictions
CPUs and Add Ons with restrictions
Other add-on products: Pivotal Data Dispatch, Alpine Chorus
1
3
4
2
5
6
Greenplum DB
Pivotal HD
HAWQ
GemFire XD
SQLFire
GemFire
* GemFire XD will be included upon GA-Est. Q2-2014
43 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
The Scenario Yesterday…
Application Type
Greenplum DB Database
Pivotal HD Hadoop Distributed File System
HAWQ Parallel Query Engine
In-Memory Data Grid for Hadoop
SQLFire In-Memory Data Grid with SQL Layer
GemFire In-Memory Data Grid
Pricing Metric: Pivotal Component
SKU
1
3
4
2
5
6
Unit of Measure
Price GemFire XD*
* GemFire XD will be included upon GA. Est Q2-2014
44 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
World’s Leading Experts Pivotal Labs – Pivotal Data Labs
On Demand Services Pivotal Data Dispatch
BATCH BATCH
INTERACTIVE INTERACTIVE HAWQ Greenplum DB
Unlimited Pivotal HD
REAL-TIME REAL-TIME GemFire XD GemFire | SQLFire
45 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Customer Centric Model
UNLIMITED PIVOTAL HD INCLUDED
Software Only
Core Based
Subscription Based
Flexible Licensing
Customer Incentives
46 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Store Everything
47 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
How Does it Work in Practice?
• Obsessively collect data
• Keep it forever
• Put the data in one place
Store Everything
48 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Analyze Anything
49 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
How Does it Work in Practice?
• Cleanse, organize, and manage your data lake
• Make the right tools available
• Use the resources wisely to compute, analyze, and understand data
• Obsessively collect data
• Keep it forever
• Put the data in one place
Analyze Anything
Store Everything
50 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Build the Right Thing
51 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
How Does it Work in Practice?
• Use insights to iteratively improve your product
Build the Right Thing
• Cleanse, organize, and manage your data lake
• Make the right tools available
• Use the resources wisely to compute, analyze, and understand data
• Obsessively collect data
• Keep it forever
• Put the data in one place
Analyze Anything
Store Everything
52 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Store Everything Analyze Anything Build the Right Thing
53 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Measure the Value
http://www.gopivotal.com/big-data/pivotal-big-data-suite/value-
tool
54 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Compare the Status Quo
http://www.gopivotal.com/big-data/pivotal-big-data-suite/value-
tool
55 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
Forecast the Growth
http://www.gopivotal.com/big-data/pivotal-big-data-suite/value-
tool
56 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
http://www.gopivotal.com/big-data/pivotal-big-data-suite/value-
tool
57 © Copyright 2014 EMC Corporation. All rights reserved. © Copyright 2014 EMC Corporation. All rights reserved.
http://www.gopivotal.com/big-data/pivotal-big-data-suite/value-
tool