18
© Hortonworks Inc. 2013 Hadoop in the Enterprise Jeff Markham Technical Director, APAC Hortonworks Modern Architecture with Hadoop 2

Hadoop in the Enterprise - README

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Hadoop in the Enterprise

Jeff Markham Technical Director, APAC Hortonworks

Modern Architecture with Hadoop 2

Page 2: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Hadoop Wave ONE: Web-scale Batch Apps

time

rela

tive

%

cus

tom

ers

Customers want solutions & convenience

Customers want technology & performance

Source: Geoffrey Moore - Crossing the Chasm

2006 to 2012 Web-Scale

Batch Applications

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Late majority,

conservatives

Laggards, Skeptics

The

CH

ASM

Page 3: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Customers want solutions & convenience

Customers want technology & performance

Hadoop Wave TWO: Broad Enterprise Apps

time

rela

tive

%

cus

tom

ers

Source: Geoffrey Moore - Crossing the Chasm

Innovators, technology enthusiasts

Early adopters,

visionaries

Early majority,

pragmatists

Late majority,

conservatives

Laggards, Skeptics

The

CH

ASM

2013 & Beyond Batch, Interactive, Online, Streaming, etc., etc.

Page 4: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

2.0 Architected for the Broad Enterprise

Hadoop 2.0 Key Highlights

Rolling Upgrades

Disaster Recovery

Snapshots

Full Stack HA

Hive on Tez

YARN

HDP 2.0 Features

Single  Cluster,  Many  Workloads  

BATCH

INTERACTIVE

ONLINE

STREAMING

ZERO downtime

Multi Data Center

Point in time Recovery

Reliability

Interactive Query

Mixed workloads

Enterprise Requirements

Page 5: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

The 1st Generation of Hadoop: Batch

HADOOP 1.0 Built for Web-Scale Batch Apps

Single  App  

BATCH

HDFS

Single  App  

INTERACTIVE

Single  App  

BATCH

HDFS

•  All other usage patterns must leverage that same infrastructure

•  Forces the creation of silos for managing mixed workloads

Single  App  

BATCH

HDFS

Single  App  

ONLINE

Page 6: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

A Transition From Hadoop 1 to 2

HADOOP 1.0

HDFS  (redundant,  reliable  storage)  

MapReduce  (cluster  resource  management  

 &  data  processing)  

Page 7: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

A Transition From Hadoop 1 to 2

HADOOP 1.0

HDFS  (redundant,  reliable  storage)  

MapReduce  (cluster  resource  management  

 &  data  processing)  

HDFS  (redundant,  reliable  storage)  

YARN  (cluster  resource  management)  

MapReduce  (data  processing)  

Others  (data  processing)  

HADOOP 2.0

Page 8: Hadoop in the Enterprise - README

The Enterprise Requirement: Beyond Batch

To become an enterprise viable data platform, customers have told us they want to store ALL DATA in one place and interact with it in MULTIPLE WAYS Simultaneously & with predictable levels of service

Page 17

HDFS  (Redundant,  Reliable  Storage)  

BATCH   INTERACTIVE   STREAMING   GRAPH   IN-­‐MEMORY   HPC  MPI  ONLINE   OTHER  

Page 9: Hadoop in the Enterprise - README

YARN: Taking Hadoop Beyond Batch

• Created to manage resource needs across all uses

• Ensures predictable performance & QoS for all apps • Enables apps to run “IN” Hadoop rather than “ON”

– Key to leveraging all other common services of the Hadoop platform: security, data lifecycle management, etc.

Page 18

ApplicaIons  Run  NaIvely  IN  Hadoop  

HDFS2  (Redundant,  Reliable  Storage)  

YARN  (Cluster  Resource  Management)      

BATCH  (MapReduce)  

INTERACTIVE  (Tez)  

STREAMING  (Storm,  S4,…)  

GRAPH  (Giraph)  

IN-­‐MEMORY  (Spark)  

HPC  MPI  (OpenMPI)  

ONLINE  (HBase)  

OTHER  (Search)  (Weave…)  

Page 10: Hadoop in the Enterprise - README

Old School Hadoop: MapReduce

Page 11: Hadoop in the Enterprise - README

ResourceManager

Client

MapReduce Status

Job Submission

Client

NodeManager

Container Container

NodeManager

App Mstr Container

NodeManager

Container App Mstr

Node Status

Resource Request

New School Hadoop with YARN

Page 12: Hadoop in the Enterprise - README

5 5 Key Benefits of YARN

1.  Scale!

2.  Compatibility with MapReduce.

3.  Improved cluster utilization.

4.  New Programming Models

5.  Agility

Page 23

Page 13: Hadoop in the Enterprise - README

Apache Tez

• An alternate data processing framework to MapReduce

•  Improves performance of low-latency applications

Page 24

Page 14: Hadoop in the Enterprise - README

SQL-IN-Hadoop with Apache Hive

• Apache Hive: First Application to use YARN • Hive on Tez optimizes resource for Hive

queries to improve performance – Apache Hive is the standard for SQL interaction

in Hadoop (Most applications claim Hive compatibility today)

– Apache Tez: optimized for YARN, general purpose processing framework for existing Hadoop applications

Page 25

Stinger Initiative Simple Focus

Hado

op  

HDFS2  

YARN      

HIVE  

SQL  

MAP  REDUCE     TEZ  

Business  AnalyIcs  

Custom  Apps  

SInger  Phase  3  •  Vector  Query  •  Buffer  Cache  •  Query  Planner  

 

SInger  Phase  2  •  YARN  Resource  Mgmnt  •  Hive  on  Apache  Tez  •  Query  Service  (always  on)  

SInger  Phase  1  •  Base  OpJmizaJons  •  SQL  AnalyJcs  •  ORCFile  Format  

1 2Improve existing tools & preserve investments

Enable Hive to support interactive workloads

Increased SQL Compatibility

100x Performance Improvement

Page 15: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

SQL Compliance Highlights

Hive: More SQL & 100X Faster

Stinger Phase 3 •  Vector Query •  Buffer Cache •  Query Planner

Stinger Phase 2 •  YARN Resource Mgmnt •  Hive on Apache Tez •  Query Service

Stinger Phase 1 •  Base Optimizations •  SQL Analytics •  ORCFile Format

We Are Here

Done in Hive 0.11

CHAR

VARCHAR

DATE

DECIMAL

Sub-queries for IN/NOT IN, HAVING

EXISTS / NOT EXISTS

INTERSECT, EXCEPT

UNION DISTINCT and UNION outside of subquery

ROLLUP and CUBE

Windowing functions (OVER, RANK, etc.)

Work Started

Page 16: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Hive’s Performance Trajectory

http://hortonworks.com/blog/delivering-on-stinger-a-phase-3-progress-update/

Page 17: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Making Hadoop Enterprise Ready

Page 18: Hadoop in the Enterprise - README

© Hortonworks Inc. 2013

Thank You!

http://hortonworks.com/sandbox