Upload
hortonworks
View
498
Download
2
Tags:
Embed Size (px)
Citation preview
Accelerating Success with Rapid Data Integration for the Modern Data Architecture
John Kreisa, Hortonworks
Lawrence Schwartz, Attunity
Speakers
Lawrence Schwartz, A/unity
John Kreisa, Hortonworks
Customer Momentum
• 230+ customers (as of Q3 2014)
Hortonworks Data Platform • Completely open multi-tenant platform for any app & any
data. • A centralized architecture of consistent enterprise
services for resource management, security, operations, and governance.
Partner for Customer Success • Open source community leadership focus on enterprise
needs • Unrivaled world class support
• Founded in 2011 • Original 24 architects,
developers, operators of Hadoop from Yahoo!
• 600+ Employees • 1000+ Ecosystem Partners
Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP
Traditional systems under pressure
Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data
• Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises
Hadoop Advantages ü Manages new data paradigm ü Handles data at scale ü Cost effective ü Open source
Application
Storage HDFS
Batch Processing MapReduce
The Modern Data Architecture
Provision, Manage & Monitor
APPLICAT
IONS
DATA
SYSTEM OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
Build & Test
On Premise or in the Cloud
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
GeolocaCon Data
Repositories
RDBMS
EDW MPP
HDP
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns Data Access
Data Management
YARN
Data Marts
Business Analytics
Visualization & Dashboards
Data Marts
Business Analytics
Visualization & Dashboards
Hadoop Driver: Cost Optimization A
NA
LYTI
CS
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S D
ATA
SYST
EMS
Data Marts
Business Analytics
Visualization & Dashboards
HDP 2.2
ELT °
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data, Deeper Archive & New Sources
Enterprise Data Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
GeolocaMon Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SOU
RC
ES
Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer
Offload costly ETL Free your EDW to perform high-value functions like analytics & operations, not ETL
Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
The Modern Data Architecture & Attunity
Provision, Manage & Monitor
APPLICAT
IONS
DATA
SYSTEM
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
Build & Test
On Premise or in the Cloud
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
GeolocaCon Data
Repositories
RDBMS
EDW MPP
HDP
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Data Marts
Business Analytics
Visualization & Dashboards
Data Marts
Business Analytics
Visualization & Dashboards
Data Integration
Attunity Corporate Overview
Overview
§ Exchange (Ticker): NASDAQ (ATTU)
§ Headquarters: Burlington, MA
§ Customers: > 2000 in 60 countries
Making Any Data Available AnyMme, Anywhere
Analytics / BI
Distribution / DR
Archiving / Testing
We Move the Data
that Moves Our
Customers’ Business
To Where the Data Needs to Be ERP
CRM
POS
Legacy
Logs
Sensors
Files
9
Data Warehouse
Database Cloud
Hadoop
Global Offices
To Use Data, You Must Move it!
10
Data Needs to Be Moved to Be Useful
» 80% of the work that data scien0sts put into big data projects is spent on data integra-on and resolving data quality issues.
Source: “For Big Data ScienCsts, “Janitor Work” is Key Hurtle to Insights,” by Steve Lohr, New York Times, August 17, 2014
Data Integration Remains a Major Challenge
1. Long rollout
2. Lots of personnel
3. Mixed systems
4. Hard to maintain
5. Not real-‐Mme
Turning Data Into Value
More Data
Less Time
Less Cost
13
Data Value
The A/unity SoluMon for Big Data
• Fully automated, end-to-end. No scripting • Fast, high performance integration • Optimized for a broad range of platforms • Single pane of glass monitoring • Real-time change data capture
Attunity’s Big Solutions for Big Data
InformaMon availability soluMons that deliver compeMMve advantage
14
On-Premises
Business Data (Oracle, SQL Server, Teradata, etc…)
Machine and File Data (logs, sensors, files, etc…)
ApplicaMon Data (SAP, Salesforce, etc…)
Cloud Data (AWS RDS, Redshic, etc…)
15
Attunity Offerings
15
BUSINESS DATA Attunity Replicate and Maestro
APPLICATION DATA Attunity Gold Client
» High-performance data replication software to accelerate and reduce the costs of distributing, sharing and ensuring the availability of data
» Software for SAP that reduces storage requirements, improves the quality and availability of test data, restores development integrity, and helps ensure data security.
MACHINE AND FILE Attunity RepliWeb, Replicate, and Maestro
» Attunity Replicate, RepliWeb and Maestro offer highly scalable replication and synchronization for unstructured files, machine data and Hadoop
CLOUD DATA Attunity CloudBeam
» Attunity CloudBeam is a SaaS platform offering services for uploading and synchronizing Big Data to, from, and between cloud environments
‘Sqooping’ Big Data – Loading Data the Hard Way
» Apache Sqoop -– great tool, but not enough » Designed for transferring bulk data between
Hadoop and databases » Not capable of CDC » Doesn't optimize network traffic » Script based interface importing data table
at the time » Limited number of standard database connectors
16 Sqoop command line interface
Attunity Replicate Architecture
17
» Advanced Monitoring and Control
» Click-‐to-‐Replicate Design
» Fast Loading and Real-‐Time CDC
» Broadest Placorm Support
» Non-‐intrusive Architecture
Move Any Data, Any Time, Any Where.
Use Case: Cable Provider Modern Data Architecture with Hadoop The Journey to the Data Lake
Aeunity ConfidenCal 18
Bulk Load
Change Data
Click-‐2-‐Replicate Design. Drag. Drop. Done.
Databases
Data Feed Sources
CSV
Data Refresh
Data Append
Finance
Support
MarkeMng
Sales
Engineering
ODS Business Units
Data Lake
Use Case: Managed Health Care – Creating Golden Data Set
Aeunity ConfidenCal 19
Ad-‐hoc AnalyMcs
Bulk Load
Change Data
Click-‐2-‐Replicate Design. Drag. Drop. Done.
Databases
Data Feed Sources
CSV
BI ReporMng
VisualizaMon & AnalyMcs
ODS
Data Refresh
Data Append
ETL
Staging Area
Business TransformaMon Rules Applied
Use Case: Financial Services Institution – Fraud Detection
Aeunity ConfidenCal 20
Ad-‐hoc AnalyMcs
Bulk Load
Change Data
Data Feed Sources
BI ReporMng
VisualizaMon & AnalyMcs
ODS (PostgreSQL)
Data Refresh
Data Append
ETL
Staging Area
Business TransformaMon Rules Applied
CDC
ATTUNITY MAESTRO
EDW/Data Mart
Use Case: Sales Management Software Data Consolidation
ATTUNITY MAESTRO
MAESTRO NODE MAESTRO NODE MAESTRO NODE
Headquarters (HQ)
Regional Data Center
Data From SaaS Customers 21
Replicate Server
California New York
Customer 1 Customer 2 Customer 3 Customer 4 Customer 5
HQ
…
Replicate Server
Replicate Server
Replicate Server
Replicate Server
Replicate Server
…
Data Lake
Who’s Our Lucky Winner?
Next Steps
Download the Hortonworks Attunity Paper “The Modern Data Architecture and Automating Data Transfer” Hortonworks.com/partner/Attunity/
Learn Hadoop – Download the Sandbox
Hortonworks.com/sandbox/
Learn More about Attunity & Hortonworks
Attunity.com/hortonworks Hortonworks.com/partner/Attunity/
Thank You!
HDP delivers a completely open data platform
Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture of core enterprise services, for any application and any data.
Completely Open
• HDP incorporates every element required of an enterprise data platform: data storage, data access, governance, security, operations
Hortonworks Data Platform 2.2
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Apa
che
Pig
° °
° °
° ° °
° ° °
HDFS (Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache Falcon
Apa
che
Hiv
e C
asca
ding
A
pach
e H
Bas
e A
pach
e A
ccum
ulo
Apa
che
Sol
r A
pach
e S
park
Apa
che
Sto
rm
Apache Sqoop
Apache Flume
Apache Kafka
SECURITY
Apache Ranger
Apache Knox
Apache Falcon
OPERATIONS
Apache Ambari
Apache Zookeeper
Apache Oozie