25
Accelerating Success with Rapid Data Integration for the Modern Data Architecture John Kreisa, Hortonworks Lawrence Schwartz, Attunity

Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Embed Size (px)

Citation preview

Page 1: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Accelerating Success with Rapid Data Integration for the Modern Data Architecture

John Kreisa, Hortonworks

Lawrence Schwartz, Attunity

Page 2: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Speakers

Lawrence  Schwartz,  A/unity  

John  Kreisa,  Hortonworks  

Page 3: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Customer Momentum

•  230+ customers (as of Q3 2014)

Hortonworks Data Platform •  Completely open multi-tenant platform for any app & any

data. •  A centralized architecture of consistent enterprise

services for resource management, security, operations, and governance.

Partner for Customer Success •  Open source community leadership focus on enterprise

needs •  Unrivaled world class support

•  Founded in 2011 •  Original 24 architects,

developers, operators of Hadoop from Yahoo!

•  600+ Employees •  1000+ Ecosystem Partners

Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP

Page 4: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Traditional systems under pressure

Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 5: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Hadoop emerged as foundation of new data architecture

Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data

•  Built by Yahoo! to be the heartbeat of its ad & search business

•  Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises

Hadoop Advantages ü  Manages new data paradigm ü  Handles data at scale ü  Cost effective ü  Open source

Application

Storage HDFS

Batch Processing MapReduce

Page 6: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

The Modern Data Architecture

Provision, Manage & Monitor

APPLICAT

IONS  

DATA

   SYSTEM   OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

Build & Test

On Premise or in the Cloud

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

GeolocaCon  Data  

Repositories

RDBMS

EDW MPP

HDP

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns Data Access

Data Management

YARN

Data Marts

Business Analytics

Visualization & Dashboards

Data Marts

Business Analytics

Visualization & Dashboards

Page 7: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Hadoop Driver: Cost Optimization A

NA

LYTI

CS

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S D

ATA

SYST

EMS

Data Marts

Business Analytics

Visualization & Dashboards

HDP 2.2

ELT °

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Cold Data, Deeper Archive & New Sources

Enterprise Data Warehouse

Hot

MPP

In-Memory

Clickstream   Web    &  Social  

GeolocaMon   Sensor    &  Machine  

Server    Logs  

Unstructured  

Existing Systems

ERP   CRM   SCM  

SOU

RC

ES

Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer

Offload costly ETL Free your EDW to perform high-value functions like analytics & operations, not ETL

Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context

Page 8: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

The Modern Data Architecture & Attunity

Provision, Manage & Monitor

APPLICAT

IONS  

DATA

   SYSTEM  

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

Build & Test

On Premise or in the Cloud

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

GeolocaCon  Data  

Repositories

RDBMS

EDW MPP

HDP

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

YARN

Data Marts

Business Analytics

Visualization & Dashboards

Data Marts

Business Analytics

Visualization & Dashboards

Data Integration

Page 9: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Attunity Corporate Overview

Overview  

§  Exchange  (Ticker):  NASDAQ  (ATTU)  

§  Headquarters:  Burlington,  MA  

§  Customers:  >  2000  in  60  countries      

Making  Any  Data  Available  AnyMme,  Anywhere  

Analytics / BI

Distribution / DR

Archiving / Testing

We  Move  the  Data  

that  Moves  Our  

Customers’  Business  

To Where the Data Needs to Be ERP

CRM

POS

Legacy

Logs

Sensors

Files

9  

Data  Warehouse  

Database   Cloud  

Hadoop  

Global  Offices  

Page 10: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

To Use Data, You Must Move it!

10  

Page 11: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Data Needs to Be Moved to Be Useful

» 80%  of  the  work  that  data  scien0sts  put  into  big  data  projects  is  spent  on  data  integra-on  and  resolving  data  quality  issues.  

Source:  “For  Big  Data  ScienCsts,  “Janitor  Work”  is  Key  Hurtle  to  Insights,”  by  Steve  Lohr,  New  York  Times,  August  17,  2014  

Page 12: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Data Integration Remains a Major Challenge

1.   Long  rollout  

2.   Lots  of  personnel  

3.   Mixed  systems  

4.   Hard  to  maintain  

5.   Not  real-­‐Mme  

Page 13: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Turning Data Into Value

More Data

Less Time

Less Cost

13  

Data   Value  

The  A/unity  SoluMon  for  Big  Data    

•  Fully automated, end-to-end. No scripting •  Fast, high performance integration •  Optimized for a broad range of platforms •  Single pane of glass monitoring •  Real-time change data capture

Page 14: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Attunity’s Big Solutions for Big Data

InformaMon  availability  soluMons  that  deliver  compeMMve  advantage  

14  

On-Premises

Business  Data  (Oracle,  SQL  Server,  Teradata,  etc…)  

Machine  and  File  Data  (logs,  sensors,  files,  etc…)  

ApplicaMon  Data  (SAP,  Salesforce,  etc…)  

Cloud  Data  (AWS  RDS,  Redshic,  etc…)  

Page 15: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

15  

Attunity Offerings

15  

BUSINESS DATA Attunity Replicate and Maestro

APPLICATION DATA Attunity Gold Client

»  High-performance data replication software to accelerate and reduce the costs of distributing, sharing and ensuring the availability of data

»  Software for SAP that reduces storage requirements, improves the quality and availability of test data, restores development integrity, and helps ensure data security.

MACHINE AND FILE Attunity RepliWeb, Replicate, and Maestro

»  Attunity Replicate, RepliWeb and Maestro offer highly scalable replication and synchronization for unstructured files, machine data and Hadoop

CLOUD DATA Attunity CloudBeam

»  Attunity CloudBeam is a SaaS platform offering services for uploading and synchronizing Big Data to, from, and between cloud environments

Page 16: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

‘Sqooping’ Big Data – Loading Data the Hard Way

»  Apache Sqoop -– great tool, but not enough »  Designed for transferring bulk data between

Hadoop and databases »  Not capable of CDC »  Doesn't optimize network traffic »  Script based interface importing data table

at the time »  Limited number of standard database connectors

16  Sqoop command line interface

Page 17: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Attunity Replicate Architecture

17  

»  Advanced  Monitoring  and  Control  

»  Click-­‐to-­‐Replicate  Design  

»  Fast  Loading  and    Real-­‐Time  CDC  

»  Broadest  Placorm  Support  

»  Non-­‐intrusive  Architecture  

Move  Any  Data,  Any  Time,  Any  Where.  

Page 18: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Use Case: Cable Provider Modern Data Architecture with Hadoop The Journey to the Data Lake

Aeunity  ConfidenCal   18  

Bulk Load

Change Data

Click-­‐2-­‐Replicate  Design.  Drag.  Drop.  Done.  

Databases  

Data  Feed  Sources  

CSV  

Data Refresh

Data Append

Finance  

Support  

MarkeMng  

Sales  

Engineering  

ODS   Business  Units  

Data Lake

Page 19: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Use Case: Managed Health Care – Creating Golden Data Set

Aeunity  ConfidenCal   19  

Ad-­‐hoc    AnalyMcs  

Bulk Load

Change Data

Click-­‐2-­‐Replicate  Design.  Drag.  Drop.  Done.  

Databases  

Data  Feed  Sources  

CSV  

BI    ReporMng  

VisualizaMon  &  AnalyMcs  

ODS  

Data Refresh

Data Append

ETL  

Staging Area

Business  TransformaMon  Rules  Applied  

Page 20: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Use Case: Financial Services Institution – Fraud Detection

Aeunity  ConfidenCal   20  

Ad-­‐hoc    AnalyMcs  

Bulk Load

Change Data

Data  Feed  Sources  

BI    ReporMng  

VisualizaMon  &  AnalyMcs  

ODS  (PostgreSQL)  

Data Refresh

Data Append

ETL  

Staging Area

Business  TransformaMon  Rules  Applied  

CDC  

ATTUNITY MAESTRO  

EDW/Data  Mart    

Page 21: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

     

Use Case: Sales Management Software Data Consolidation

ATTUNITY MAESTRO  

MAESTRO NODE  MAESTRO NODE  MAESTRO NODE  

Headquarters  (HQ)  

Regional  Data  Center  

Data  From  SaaS  Customers   21  

Replicate Server  

California   New York  

Customer 1   Customer 2   Customer 3   Customer  4   Customer 5  

HQ  

…  

Replicate Server  

Replicate Server  

Replicate Server  

Replicate Server  

Replicate Server  

…  

Data Lake

Page 22: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Who’s Our Lucky Winner?

Page 23: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Next Steps

Download the Hortonworks Attunity Paper “The Modern Data Architecture and Automating Data Transfer” Hortonworks.com/partner/Attunity/

Learn Hadoop – Download the Sandbox

Hortonworks.com/sandbox/

Learn More about Attunity & Hortonworks

Attunity.com/hortonworks Hortonworks.com/partner/Attunity/

Page 24: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

Thank You!

Page 25: Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Modern Data Architecture

HDP delivers a completely open data platform

Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.

Completely Open

•  HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations

Hortonworks Data Platform 2.2

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Apa

che

Pig

° °

° °

° ° °

° ° °

HDFS (Hadoop Distributed File System)

GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

Apache Falcon

Apa

che

Hiv

e C

asca

ding

A

pach

e H

Bas

e A

pach

e A

ccum

ulo

Apa

che

Sol

r A

pach

e S

park

Apa

che

Sto

rm

Apache Sqoop

Apache Flume

Apache Kafka

SECURITY

Apache Ranger

Apache Knox

Apache Falcon

OPERATIONS

Apache Ambari

Apache Zookeeper

Apache Oozie