Upload
redpoint-global-inc
View
380
Download
0
Tags:
Embed Size (px)
Citation preview
Overview of RedPoint Data Management for Hortonworks Hadoop2014
2 RedPoint Global Inc.April 13, 2023© Confidential
What is Hadoop/Hadoop 2.0?
Hadoop 1.0
• All operations based on Map Reduce
• Intrinsic inconsistency of code based solutions
• Highly skilled and expensive resources needed
• 3rd party applications constrained by the need to generate code
Lowercostscaling
No needforstructure
Ease ofdatacapture
Hadoop 2.0
• Introduction of the YARN: “a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.”
• Mature applications can now operate directly on Hadoop
• Reduce skill requirements and increased consistency
3 RedPoint Global Inc.April 13, 2023© Confidential
Challenges to Hadoop Adoption
• Severe shortage of MR skilled resources
• Very expensive resources and hard to retain
• Inconsistent skills lead to inconsistent results
• Under utilizes existing resources
• Prevents broad leverage of investments across enterprise
Skills Gap
• A nascent technology ecosystem around Hadoop
• Emerging technologies only address narrow slivers of functionality
• New applications are not enterprise class
• Legacy applications have built short term capabilities
Maturity & Governance
• Data is not useful in its raw state, it must be turned into information
• Benefit of Hadoop is that same data can be used from many perspectives
• Analysts must now do the structuring of the data based on intended use of the data
Data Into Information
4 RedPoint Global Inc.April 13, 2023© Confidential
How RedPoint Helps
First YARN compliant ETL/data quality toolset on the market – brings together both Big Data and traditional data to create Big Information!
• Customer or Party Data
• Processing Speed
• Match Quality
• Ease of Use
by in:RANKED
#1The power to make your data the biggest asset your organization has
5 RedPoint Global Inc.April 13, 2023© Confidential
RedPoint in a Hortonworks environment
APPL
ICAT
ION
SDA
TA S
YSTE
MSO
URC
ES
OLTP, ERP,CRM Systems
Documents, Emails
Web Logs,Click Streams
Social Networks
Machine Generated
SensorData
Geolocation Data
RepositoriesG
ov
ern
an
ce
&
Inte
gra
tio
n
Sec
uri
ty
Op
era
tio
ns
Data Access
Data Management
RDBMSEDWMPP
Data QualityData Integration
One application, one graphical user interface for traditional and Big Data
ELT ETL Cleanse Match De-dupe Merge/Purge Household Partition Parse Append Standardize Key Automate Monitor
Notify
Pre-built adaptersand ODBC drivers.
Pure YARN applicationNo MapReduce neededNo in-cluster installation
6 RedPoint Global Inc.April 13, 2023© Confidential
Monitoring and Management Tools
Typical Hadoop architecture without RedPoint
AMBARI
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG (metadata services)
Query/Visualization/ Reporting/Analytical
Tools and Apps
SOURCE DATA
- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory
DBs
JMSQueue’s
FilesFilesFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
FLUME
WebHDFS
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
n
HDFS
1
7 RedPoint Global Inc.April 13, 2023© Confidential
Monitoring and Management Tools
Typical Hadoop architecture with RedPoint
AMBARI
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG (metadata services)
Query/Visualization/ Reporting/Analytical
Tools and Apps
SOURCE DATA
- Sensor Logs- Clickstream- Flat Files- Unstructured- Sentiment- Customer- Inventory
DBs
JMSQueue’s
FilesFilesFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
WebHDFS
Flume
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
n
HDFS
1
8 RedPoint Global Inc.April 13, 2023© Confidential