Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
© Hortonworks Inc. 2013
Hadoop in the Enterprise
Jeff Markham Technical Director, APAC Hortonworks
Modern Architecture with Hadoop 2
© Hortonworks Inc. 2013
Hadoop Wave ONE: Web-scale Batch Apps
time
rela
tive
%
cus
tom
ers
Customers want solutions & convenience
Customers want technology & performance
Source: Geoffrey Moore - Crossing the Chasm
2006 to 2012 Web-Scale
Batch Applications
Innovators, technology enthusiasts
Early adopters,
visionaries
Early majority,
pragmatists
Late majority,
conservatives
Laggards, Skeptics
The
CH
ASM
© Hortonworks Inc. 2013
Customers want solutions & convenience
Customers want technology & performance
Hadoop Wave TWO: Broad Enterprise Apps
time
rela
tive
%
cus
tom
ers
Source: Geoffrey Moore - Crossing the Chasm
Innovators, technology enthusiasts
Early adopters,
visionaries
Early majority,
pragmatists
Late majority,
conservatives
Laggards, Skeptics
The
CH
ASM
2013 & Beyond Batch, Interactive, Online, Streaming, etc., etc.
© Hortonworks Inc. 2013
2.0 Architected for the Broad Enterprise
Hadoop 2.0 Key Highlights
Rolling Upgrades
Disaster Recovery
Snapshots
Full Stack HA
Hive on Tez
YARN
HDP 2.0 Features
Single Cluster, Many Workloads
BATCH
INTERACTIVE
ONLINE
STREAMING
ZERO downtime
Multi Data Center
Point in time Recovery
Reliability
Interactive Query
Mixed workloads
Enterprise Requirements
© Hortonworks Inc. 2013
The 1st Generation of Hadoop: Batch
HADOOP 1.0 Built for Web-Scale Batch Apps
Single App
BATCH
HDFS
Single App
INTERACTIVE
Single App
BATCH
HDFS
• All other usage patterns must leverage that same infrastructure
• Forces the creation of silos for managing mixed workloads
Single App
BATCH
HDFS
Single App
ONLINE
© Hortonworks Inc. 2013
A Transition From Hadoop 1 to 2
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
© Hortonworks Inc. 2013
A Transition From Hadoop 1 to 2
HADOOP 1.0
HDFS (redundant, reliable storage)
MapReduce (cluster resource management
& data processing)
HDFS (redundant, reliable storage)
YARN (cluster resource management)
MapReduce (data processing)
Others (data processing)
HADOOP 2.0
The Enterprise Requirement: Beyond Batch
To become an enterprise viable data platform, customers have told us they want to store ALL DATA in one place and interact with it in MULTIPLE WAYS Simultaneously & with predictable levels of service
Page 17
HDFS (Redundant, Reliable Storage)
BATCH INTERACTIVE STREAMING GRAPH IN-‐MEMORY HPC MPI ONLINE OTHER
YARN: Taking Hadoop Beyond Batch
• Created to manage resource needs across all uses
• Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON”
– Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc.
Page 18
ApplicaIons Run NaIvely IN Hadoop
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH (MapReduce)
INTERACTIVE (Tez)
STREAMING (Storm, S4,…)
GRAPH (Giraph)
IN-‐MEMORY (Spark)
HPC MPI (OpenMPI)
ONLINE (HBase)
OTHER (Search) (Weave…)
Old School Hadoop: MapReduce
ResourceManager
Client
MapReduce Status
Job Submission
Client
NodeManager
Container Container
NodeManager
App Mstr Container
NodeManager
Container App Mstr
Node Status
Resource Request
New School Hadoop with YARN
5 5 Key Benefits of YARN
1. Scale!
2. Compatibility with MapReduce.
3. Improved cluster utilization.
4. New Programming Models
5. Agility
Page 23
Apache Tez
• An alternate data processing framework to MapReduce
• Improves performance of low-latency applications
Page 24
SQL-IN-Hadoop with Apache Hive
• Apache Hive: First Application to use YARN • Hive on Tez optimizes resource for Hive
queries to improve performance – Apache Hive is the standard for SQL interaction
in Hadoop (Most applications claim Hive compatibility today)
– Apache Tez: optimized for YARN, general purpose processing framework for existing Hadoop applications
Page 25
Stinger Initiative Simple Focus
Hado
op
HDFS2
YARN
HIVE
SQL
MAP REDUCE TEZ
Business AnalyIcs
Custom Apps
SInger Phase 3 • Vector Query • Buffer Cache • Query Planner
SInger Phase 2 • YARN Resource Mgmnt • Hive on Apache Tez • Query Service (always on)
SInger Phase 1 • Base OpJmizaJons • SQL AnalyJcs • ORCFile Format
1 2Improve existing tools & preserve investments
Enable Hive to support interactive workloads
Increased SQL Compatibility
100x Performance Improvement
© Hortonworks Inc. 2013
SQL Compliance Highlights
Hive: More SQL & 100X Faster
Stinger Phase 3 • Vector Query • Buffer Cache • Query Planner
Stinger Phase 2 • YARN Resource Mgmnt • Hive on Apache Tez • Query Service
Stinger Phase 1 • Base Optimizations • SQL Analytics • ORCFile Format
We Are Here
Done in Hive 0.11
CHAR
VARCHAR
DATE
DECIMAL
Sub-queries for IN/NOT IN, HAVING
EXISTS / NOT EXISTS
INTERSECT, EXCEPT
UNION DISTINCT and UNION outside of subquery
ROLLUP and CUBE
Windowing functions (OVER, RANK, etc.)
Work Started
© Hortonworks Inc. 2013
Hive’s Performance Trajectory
http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/
© Hortonworks Inc. 2013
Making Hadoop Enterprise Ready
© Hortonworks Inc. 2013
Thank You!
http://hortonworks.com/sandbox