Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The Next Disruption – Managing the Data Explosion or How to build a Data Lake
1
Pete Melroy – Solutions Manager Kam Wong – Solutions Architect
Information Builders 3i – Solution Architecture
2
Portal Embedded InfoApps™
Applications Legacy Systems Relational/Cubes Big Data Columnar/In Memory Unstructured Social Media Web Services Trading Partners
Integration
Mobile Write-Back
Data Discovery Reporting Dashboards
High-Performance Data Store
Data Quality Data Governance
Master Data Management
Batch ETL Real-Time ESB
Integrity
Intelligence
Location Analytics
In-Document Analytics
Casting and Archiving
Search Predictive Analytics
Sentiment and Word Analytics
Performance Management
SocialHot
BadFeedback
90% Of all the data in the world has been generated over the last 2 years
Data Output is growing rapidly
2009 2010 2011 2012 2013 2014 2015
What’s Going On? And Why All the Disruption?
3
Some Other Things to Think About…
By 2020, every human will generate 1.7 megabytes every… second
570 new websites are created every… minute
The world generates about 2.5 exabytes of data every… day
****Mankind has spoken about 5 exabytes of words… ever
4
Traditional in Transition to Modern
5
Fewer use cases
More use cases
Modern Traditional
Hadoop
IoT
Streaming
Virtual DW
Data Lake
OLTP
OLAP
Data warehouses
Data marts
Point-to-point Integration
EII
Real-World Strategies for Deploying Big Data
6
iWay Big Data Integrator - 100% Run “in” Hadoop architecture
Simplified interface
Native Hadoop script generation
Process mgmt. & governance Simplified easy-to-use interface
to integrate in Hadoop
Marshals Hadoop resources and standards
Takes advantage of performance and resource negotiation
Includes sophisticated process management and governance
Sqo
op
, Flum
e…
Avro
, JSON
…
Traditional applications and data stores
iWay Big Data Integrator
Simplified, modern, native Hadoop integration
Big Data Hadoop
Any distribution, Any data
IoT/Cleansing/Predictive – Logical Reference Architecture
7
Data in Motion
Data at Rest
Flume
Other Process
Sqoop
Agent 2
Agent 1
Agent 3
Producers
Spark Processing + Data Quality
Spark Data Quality
HDFS (Predictive) Analytics
RDBMS Data
Kafka
Topics
0 1 2
0 1 2
0 1 2
Job 1
Job 2
Job 3
Streaming Content, IoT, Application Data
Hadoop in the Data Warehouse Ingest, Transform and Load
Unstructured
Semi-Structured
Structured
External
8
Data Lake
Raw Data
Data Ponds
BI Apps
Raw Data Actionable Data World Class Analytics
Data Marts
Operational System
Operational System
Data Marts
Profile, cleanse, master, etc.
Ingest without coding
Create w/relevant data
Data Swamp
Big Data Analytics Demonstration
12
HDFS Cluster
Data Sources
Managed Data Lake
Big Data Integrator
Development Tools
Data Visualization
WebFocus
Business Intelligence
Information Builders Big Data Analytics
real-time
streaming data
deploy / run
Flume Agent
source channel
sink
Data Wrangling
structured
Mapping
Transform
Sqoop
Predictive
Modeling
SparkR
Data Quality
Cleansing
Match/Merge
Data
Node
Data
Node
Data
Node Data
Node
Data
Node
Data
Node Data
Node
Data
Node
Data
Node Data
Node
Data
Node
Data
Node Data
Node
Data
Node
Data
Node
ELTdata prep data prep
IoT
Flat
ERPCRM
Enterprise Dataenhance data
real-time
real-time fraud detection analytics dashboard Users
Insurance Predictive Application Real-Time Fraud Detection
17
Insurance Predictive Application Scores incoming claims for fraud likelihood to prioritize investigations
18