11
The Next Disruption – Managing the Data Explosion or How to build a Data Lake 1 Pete Melroy – Solutions Manager Kam Wong – Solutions Architect

The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

The Next Disruption – Managing the Data Explosion or How to build a Data Lake

1

Pete Melroy – Solutions Manager Kam Wong – Solutions Architect

Page 2: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Information Builders 3i – Solution Architecture

2

Portal Embedded InfoApps™

Applications Legacy Systems Relational/Cubes Big Data Columnar/In Memory Unstructured Social Media Web Services Trading Partners

Integration

Mobile Write-Back

Data Discovery Reporting Dashboards

High-Performance Data Store

Data Quality Data Governance

Master Data Management

Batch ETL Real-Time ESB

Integrity

Intelligence

Location Analytics

In-Document Analytics

Casting and Archiving

Search Predictive Analytics

Sentiment and Word Analytics

Performance Management

SocialHot

BadFeedback

Page 3: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

90% Of all the data in the world has been generated over the last 2 years

Data Output is growing rapidly

2009 2010 2011 2012 2013 2014 2015

What’s Going On? And Why All the Disruption?

3

Page 4: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Some Other Things to Think About…

By 2020, every human will generate 1.7 megabytes every… second

570 new websites are created every… minute

The world generates about 2.5 exabytes of data every… day

****Mankind has spoken about 5 exabytes of words… ever

4

Page 5: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Traditional in Transition to Modern

5

Fewer use cases

More use cases

Modern Traditional

Hadoop

IoT

Streaming

Virtual DW

Data Lake

OLTP

OLAP

Data warehouses

Data marts

Point-to-point Integration

EII

Page 6: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Real-World Strategies for Deploying Big Data

6

iWay Big Data Integrator - 100% Run “in” Hadoop architecture

Simplified interface

Native Hadoop script generation

Process mgmt. & governance Simplified easy-to-use interface

to integrate in Hadoop

Marshals Hadoop resources and standards

Takes advantage of performance and resource negotiation

Includes sophisticated process management and governance

Sqo

op

, Flum

e…

Avro

, JSON

Traditional applications and data stores

iWay Big Data Integrator

Simplified, modern, native Hadoop integration

Big Data Hadoop

Any distribution, Any data

Page 7: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

IoT/Cleansing/Predictive – Logical Reference Architecture

7

Data in Motion

Data at Rest

Flume

Other Process

Sqoop

Agent 2

Agent 1

Agent 3

Producers

Spark Processing + Data Quality

Spark Data Quality

HDFS (Predictive) Analytics

RDBMS Data

Kafka

Topics

0 1 2

0 1 2

0 1 2

Job 1

Job 2

Job 3

Streaming Content, IoT, Application Data

Page 8: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Hadoop in the Data Warehouse Ingest, Transform and Load

Unstructured

Semi-Structured

Structured

External

8

Data Lake

Raw Data

Data Ponds

BI Apps

Raw Data Actionable Data World Class Analytics

Data Marts

Operational System

Operational System

Data Marts

Profile, cleanse, master, etc.

Ingest without coding

Create w/relevant data

Data Swamp

Page 9: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Big Data Analytics Demonstration

12

HDFS Cluster

Data Sources

Managed Data Lake

Big Data Integrator

Development Tools

Data Visualization

WebFocus

Business Intelligence

Information Builders Big Data Analytics

real-time

streaming data

deploy / run

Flume Agent

source channel

sink

Data Wrangling

structured

Mapping

Transform

Sqoop

Predictive

Modeling

SparkR

Data Quality

Cleansing

Match/Merge

Data

Node

Data

Node

Data

Node Data

Node

Data

Node

Data

Node Data

Node

Data

Node

Data

Node Data

Node

Data

Node

Data

Node Data

Node

Data

Node

Data

Node

ELTdata prep data prep

IoT

Flat

ERPCRM

Enterprise Dataenhance data

real-time

real-time fraud detection analytics dashboard Users

Page 10: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Insurance Predictive Application Real-Time Fraud Detection

17

Page 11: The Next Disruption Managing the Data Explosion or How to ...files.meetup.com/18203116/NY User Group Preso 9-8-2016 Final.pdf · 6 iWay Big Data Integrator - 100% Run “in” Hadoop

Insurance Predictive Application Scores incoming claims for fraud likelihood to prioritize investigations

18