71
Welcome to

Welcome to Hadoop2Land!

  • View
    6

  • Download
    2

Embed Size (px)

DESCRIPTION

This talk takes you on a rollercoaster ride through Hadoop 2 and explains the most significant changes and components. The talk has been held on the JavaLand conference in Brühl, Germany on 25.03.2014. Agenda: - Welcome Office - YARN Land - HDFS 2 Land - YARN App Land - Enterprise Land

Citation preview

Page 1: Welcome to Hadoop2Land!

25.03.2014

Welcome to

uweseiler

Page 2: Welcome to Hadoop2Land!

25.03.2014

2 Your Travel Guide

Big Data Nerd

TravelpiratePhotography Enthusiast

Hadoop Trainer NoSQL Fan Boy

Page 3: Welcome to Hadoop2Land!

25.03.2014

2 Your Travel Agency

specializes on...

Big Data Nerds Agile Ninjas Continuous Delivery Gurus

Enterprise Java Specialists Performance Geeks

Join us!

Page 4: Welcome to Hadoop2Land!

25.03.2014

2 Your Travel Destination

Welcome OfficeYARN Land

HDFS 2 Land

YARN App Land Enterprise Land

Goodbye Photo

Page 5: Welcome to Hadoop2Land!

25.03.2014

2 Stop #1: Welcome Office

Welcome Office

Page 6: Welcome to Hadoop2Land!

25.03.2014

2

…there was MapReduce

In the beginning of Hadoop

• It could handle data sizes way beyond thoseof its competitors

• It was resilient in the face of failure

• It made it easy for users to bring their codeand algorithms to the data

Page 7: Welcome to Hadoop2Land!

25.03.2014

2

…but it was too low levelHadoop 1 (2007)

Page 8: Welcome to Hadoop2Land!

25.03.2014

2

…but it was too rigidHadoop 1 (2007)

Page 9: Welcome to Hadoop2Land!

25.03.2014

2

HDFS

…but it was Batch

HDFS HDFS

Single App

Batch

Single App

Batch

Single App

Batch

Single App

Batch

Single App

Batch

Hadoop 1 (2007)

Page 10: Welcome to Hadoop2Land!

25.03.2014

2

…but it had limitationsHadoop 1 (2007)

• Scalability– Maximum cluster size ~ 4,500 nodes – Maximum concurrent tasks – 40,000– Coarse synchronization in JobTracker

• Availability– Failure kills all queued and running jobs

• Hard partition of resources into map & reduce slots– Low resource utilization

• Lacks support for alternate paradigms and services

Page 11: Welcome to Hadoop2Land!

25.03.2014

2

YARN to the rescue!

Hadoop 2 (2013)

Page 12: Welcome to Hadoop2Land!

25.03.2014

2 Stop #2: YARN Land

Welcome OfficeYARN Land

Page 13: Welcome to Hadoop2Land!

25.03.2014

2 A brief history of Hadoop 2

• Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008 and now is

the Hadoop 2 release manager

• The community has been working on Hadoop 2 for over 4 years

• Hadoop 2 based architecture running at scale at Yahoo! – Deployed on 35,000+ nodes for 6+ months

Page 14: Welcome to Hadoop2Land!

25.03.2014

2

Hadoop 1

HDFSRedundant, reliable

storage

Hadoop 2: Next-gen platform

MapReduceCluster resource mgmt.

+ data processing

Hadoop 2

HDFS 2Redundant, reliable storage

MapReduceData processing

Single use systemBatch Apps

Multi-purpose platformBatch, Interactive, Streaming, …

YARNCluster resource management

OthersData processing

Page 15: Welcome to Hadoop2Land!

25.03.2014

2 Taking Hadoop beyond batch

Applications run natively in Hadoop

HDFS 2Redundant, reliable storage

BatchMapReduce

Store all data in one placeInteract with data in multiple ways

YARNCluster resource management

InteractiveTez

OnlineHOYA

StreamingStorm, …

GraphGiraph

In-MemorySpark

OtherSearch, …

Page 16: Welcome to Hadoop2Land!

25.03.2014

2 YARN: Design Goals

• Build a new abstraction layer by splitting up the two major functions of the JobTracker• Cluster resource management • Application life-cycle management

• Allow other processing paradigms• Flexible API for implementing YARN apps• MapReduce becomes YARN app• Lots of different YARN apps

Page 17: Welcome to Hadoop2Land!

25.03.2014

2 Hadoop: BigDataOS™

Storage:

File System

Traditional OperatingSystem

Execution / Scheduling:

Processes / Kernel

Storage:

HDFS

Hadoop

Execution / Scheduling:

YARN

Page 18: Welcome to Hadoop2Land!

25.03.2014

2 YARN: Architectural Overview

Split up the two major functions of the JobTrackerCluster resource management & Application life-cycle management

ResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

AM 1

Container 1.2

Container 1.1

AM 2

Container 2.1

Container 2.2

Container 2.3

Page 19: Welcome to Hadoop2Land!

25.03.2014

2 YARN: Multi-tenancy IResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

MapReduce 1

map 1.2

map 1.1

MapReduce 2

map 2.1

map 2.2

reduce 2.1

NodeManager NodeManager NodeManager NodeManager

reduce 1.1 Tez map 2.3

reduce 2.2

vertex 1

vertex 2

vertex 3

vertex 4

HOYA

HBase Master

Region server 1

Region server 2

Region server 3 Storm

nimbus 1

nimbus 2

Different types of applications on the same cluster

Page 20: Welcome to Hadoop2Land!

25.03.2014

2 YARN: Multi-tenancy IIResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

MapReduce 1

map 1.2

map 1.1

MapReduce 2

map 2.1

map 2.2

reduce 2.1

NodeManager NodeManager NodeManager NodeManager

reduce 1.1 Tez map 2.3

reduce 2.2

vertex 1

vertex 2

vertex 3

vertex 4

HOYA

HBase Master

Region server 1

Region server 2

Region server 3 Storm

nimbus 1

nimbus 2

DWHUser Ad-Hoc

root

30% 60% 10%

Dev Prod

20% 80%

Dev Prod

Dev1 Dev2

25% 75%

60% 40%

Different users and organizations on the

same cluster

Page 21: Welcome to Hadoop2Land!

25.03.2014

2 YARN: Your own app

ResourceManager

NodeManager NodeManager

NodeManager

Scheduler

Container

Container

MyApp

YARNClient

App specificAPI

myAM

AMRMClient

NMClient

ApplicationClient

Protocol

ApplicationMaster

Protocol

Container Management Protocol

DistributedShell is thenew WordCount!

Page 22: Welcome to Hadoop2Land!

25.03.2014

2 Stop #3: HDFS 2 Land

Welcome OfficeYARN Land

HDFS 2 Land

Page 23: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: In a nutshell

• Removes tight coupling of Block Storage and Namespace

• Adds (built-in) High Availability

• Better Scalability & Isolation

• Increased performance

Details: https://issues.apache.org/jira/browse/HDFS-1052

Page 24: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: Federation

NameNodes do not talk to each other

NameNodes manages only slice of namespace

DataNodes can store blocks managed by

any NameNode

NameNode 1

Namespace 1

Namespace State

Block Map

NameNode 2

Namespace 2

Block Pools

Pool 1 Pool 2

Block Storage as generic storage service

Data Nodesb3 b1

b2 b4

b2 b1

b5

b3 b2

b5 b4

JBOD JBOD JBOD

Namespace State

Block Map

Horizontally scale IO and storage

Page 25: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: Architecture

Active NameNode Standby NameNode

DataNode DataNode DataNode DataNode DataNode

Maintains Block Map and Edits File Simultaneously

reads and applies the edits

Report to both NameNodes

BlockMap

EditsFile

BlockMap

EditsFile

NFS Shared state on NFS

OR

Quorum based storageJournal Node

Journal Node

Journal Node

Take orders only from the

Active

or

Page 26: Welcome to Hadoop2Land!

25.03.2014

2

ZKFailoverController

ZKFailoverController

HDFS 2: High Availability

Active NameNode Standby NameNode

DataNode DataNode DataNode DataNode DataNode

BlockMap

EditsFile

BlockMap

EditsFile

ZooKeeper Node

ZooKeeperNode

ZooKeeperNode

Send Heartbeats & Block Reports

Shared StateMonitors healthof NN, OS, HW

Heartbeat Heartbeat

Holds special lock znode

Journal Node

Journal Node

Journal Node

Page 27: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: Write-Pipeline

• Earlier versions of HDFS• Files were immutable• Write-once-read-many model

• New features in HDFS 2• Files can be reopened for append• New primitives: hflush and hsync• Replace data node on failure• Read consistency

DataNode 1

DataNode 2

DataNode 3

DataNode 4

Writer

Add new node tothe pipeline

Reader

Data Data Data

Can read from any node and then failover to any other node

Page 28: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: Snapshots

• Admin can create point in time snapshots of HDFS • Of the entire file system• Of a specific data-set (sub-tree directory of file system)

• Restore state of entire file system or data-set to a snapshot (like Apple Time Machine) • Protect against user errors

• Snapshot diffs identify changes made to data set • Keep track of how raw or derived/analytical data changes

over time

Page 29: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: NFS Gateway

• Supports NFS v3 (NFS v4 is work in progress)

• Supports all HDFS commands • List files • Copy, move files • Create and delete directories

• Ingest for large scale analytical workloads • Load immutable files as source for analytical processing • No random writes

• Stream files into HDFS • Log ingest by applications writing directly to HDFS client

mount

Page 30: Welcome to Hadoop2Land!

25.03.2014

2 HDFS 2: Performance

• Many improvements • New AppendableWrite-Pipeline• Read path improvements for fewer memory copies• Short-circuit local reads for 2-3x faster random

reads• I/O improvements using posix_fadvise()• libhdfs improvements for zero copy reads

• Significant improvements: I/O 2.5 - 5x faster

Page 31: Welcome to Hadoop2Land!

25.03.2014

2 Stop #4: YARN App Land

Welcome OfficeYARN Land

HDFS 2 Land

YARN App Land

Page 32: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 33: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 34: Welcome to Hadoop2Land!

25.03.2014

2 MapReduce 2: In a nutshell

• MapReduce is now a YARN app• No more map and reduce slots, it’s containers now• No more JobTracker, it’s YarnAppmaster library now

• Multiple versions of MapReduce • The older mapred APIs work without modification or recompilation • The newer mapreduce APIs may need to be recompiled

• Still has one master server component: the Job History Server • The Job History Server stores the execution of jobs • Used to audit prior execution of jobs • Will also be used by YARN framework to store charge backs at that level

• Better cluster utilization• Increased scalability & availability

Page 35: Welcome to Hadoop2Land!

25.03.2014

2 MapReduce 2: Shuffle

• Faster Shuffle • Better embedded server: Netty

• Encrypted Shuffle • Secure the shuffle phase as data moves across the cluster • Requires 2 way HTTPS, certificates on both sides • Causes significant CPU overhead, reserve 1 core for this work • Certificates stored on each node (provision with the cluster), refreshed every

10 secs

• Pluggable Shuffle Sort • Shuffle is the first phase in MapReduce that is guaranteed to not be data-

local • Pluggable Shuffle/Sort allows application developers or hardware

developers to intercept the network-heavy workload and optimize it • Typical implementations have hardware components like fast networks and

software components like sorting algorithms • API will change with future versions of Hadoop

Page 36: Welcome to Hadoop2Land!

25.03.2014

2 MapReduce 2: Performance

• Key Optimizations • No hard segmentation of resource into map and reduce slots • YARN scheduler is more efficient • MR2 framework has become more efficient than MR1: shuffle

phase in MRv2 is more performant with the usage of Netty.

• 40.000+ nodes running YARN across over 365 PB of data. • About 400.000 jobs per day for about 10 million hours of

compute time.• Estimated 60% – 150% improvement on node usage per day• Got rid of a whole 10,000 node datacenter because of their

increased utilization.

Page 37: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 38: Welcome to Hadoop2Land!

25.03.2014

2 Apache Tez: In a nutshell

• Distributed execution framework that works on computations represented as dataflow graphs

• Tez is Hindi for “speed”

• Naturally maps to execution plans produced by query optimizers

• Highly customizable to meet a broad spectrum of use cases and to enable dynamic performance optimizations at runtime

• Built on top of YARN

Page 39: Welcome to Hadoop2Land!

25.03.2014

2 Apache Tez: Architecture

• Task with pluggable Input, Processor & Output

Task

HDFSInput

MapProcessor

SortedOutput

„Classical“ Map

Task

ShuffleInput

ReduceProcessor

HDFSOutput

„Classical“ Reduce YARN ApplicationMaster to run DAG of Tez Tasks

Page 40: Welcome to Hadoop2Land!

25.03.2014

2 Apache Tez: Tez Service

• MapReduce Query Startup is expensive:– Job launch & task-launch latencies are fatal for

short queries (in order of 5s to 30s)

• Solution:– Tez Service (= Preallocated Application Master)

• Removes job-launch overhead (Application Master)• Removes task-launch overhead (Pre-warmed Containers)

– Hive (or Pig)• Submit query plan to Tez Service

– Native Hadoop service, not ad-hoc

Page 41: Welcome to Hadoop2Land!

25.03.2014

2

Hadoop 1

HDFSRedundant, reliable storage

Apache Tez: The new primitive

MapReduceCluster resource mgmt. + data

processing

Hadoop 2

MapReduce as Base Apache Tez as Base

Pig Hive Other

HDFSRedundant, reliable storage

YARNCluster resource management

TezExecution Engine

MR Pig Hive Realtime

Storm

Other

Page 42: Welcome to Hadoop2Land!

25.03.2014

2 Apache Tez: PerformanceSELECT a.state, COUNT(*),

AVERAGE(c.price)

FROM a

JOIN b ON (a.id = b.id)

JOIN c ON (a.itemId = c.itemId)

GROUP BY a.state

Existing HiveParse Query 0.5s

Create Plan 0.5s

Launch Map-Reduce

20s

Process Map-Reduce

10s

Total 31s

Hive/TezParse Query 0.5s

Create Plan 0.5s

Launch Map-Reduce

20s

Process Map-Reduce

2s

Total 23s

Tez & Hive ServiceParse Query 0.5s

Create Plan 0.5s

Submit to Tez Service

0.5s

Process Map-Reduce 2s

Total 3.5s

* No exact numbers, for illustration only

Page 43: Welcome to Hadoop2Land!

25.03.2014

2 Stinger Initiative: In a nutshell

Page 44: Welcome to Hadoop2Land!

25.03.2014

2 Stinger: Overall Performance

* Real numbers, but handle with care!

Page 45: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 46: Welcome to Hadoop2Land!

25.03.2014

2 Storm: In a nutshell

• Stream-processing

• Real-time processing

• Developed as standalone application• https://github.com/nathanmarz/storm

• Ported on YARN• https://github.com/yahoo/storm-yarn

Page 47: Welcome to Hadoop2Land!

25.03.2014

2 Storm: Conceptual view

Spout

Spout

SpoutSource of streams

Bolt

Bolt

Bolt

Bolt

Bolt

Tuple

Bolt• Consumer of streams• Processing of tuples• Possibly emits new

tuplesStreamUnbound sequence of

tuples

Tuple

Tuple

TupleList of name-value pairs

TopologyNetwork of spouts & bolts as the nodes and

streams as the edges

Page 48: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 49: Welcome to Hadoop2Land!

25.03.2014

2 Spark: In a nutshell

• High-speed in-memory analytics over Hadoop and Hive

• Separate MapReduce-like engine– Speedup of up to 100x

– On-disk queries 5-10x faster

• Spark is now a top-level Apache project– http://spark.apache.org

• Compatible with Hadoop‘s Storage API

• Spark can be run on top of YARN– http://spark.apache.org/docs/0.9.0/running-on-yarn.html

Page 50: Welcome to Hadoop2Land!

25.03.2014

2 Spark: RDD

• Key idea: Resilient Distributed Datasets (RDDs)

• Read-only partitioned collection of records• Optionally cached in memory across cluster

• Manipulated through parallel operators• Support only coarse-grained operations

• Map• Reduce• Group-by transformations

• Automatically recomputed on failure

RDDA11A12A13

Page 51: Welcome to Hadoop2Land!

25.03.2014

2 Spark: Data Sharing

Page 52: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 53: Welcome to Hadoop2Land!

25.03.2014

2 HOYA: In a nutshell

• Create on-demand HBase clusters • Small HBase cluster in large YARN cluster• Dynamic HBase clusters• Self-healing HBase Cluster• Elastic HBase clusters• Transient/intermittent clusters for workflows

• Configure custom configurations & versions

• Better isolation

• More efficient utilization/sharing of cluster

Page 54: Welcome to Hadoop2Land!

25.03.2014

2 HOYA: Creation of AppMaster

ResourceManager

NodeManager NodeManager

NodeManager

Scheduler

Container

Container

HOYA Client

YARNClient

HOYA specific API

HOYAApplication

Master

Container

Container

Container

Container

Page 55: Welcome to Hadoop2Land!

25.03.2014

2 HOYA: Deployment of HBase

ResourceManager

NodeManager NodeManager

NodeManager

Scheduler

Container

Container

HOYA Client

YARNClient

HOYA specific API

HOYAApplication

Master

Container

Container

Container

Container

HBase Master

Region Server

Region Server

Page 56: Welcome to Hadoop2Land!

25.03.2014

2 HOYA: Bind via ZooKeeper

ResourceManager

NodeManager NodeManager

NodeManager

Scheduler

Container

Container

HOYA Client

YARNClient

HOYA specific API

HOYAApplication

Master

Container

Container

Container

Container

HBase Master

Region Server

Region Server

HBase Client

ZooKeeper

Page 57: Welcome to Hadoop2Land!

25.03.2014

2 YARN Apps: Overview

• MapReduce 2 Batch• Tez DAG Processing• Storm Stream Processing• Samza Stream Processing• Apache S4 Stream Processing• Spark In-Memory• HOYA HBase on YARN• Apache Giraph Graph Processing• Apache Hama Bulk Synchronous Parallel• Elastic Search Scalable Search

Page 58: Welcome to Hadoop2Land!

25.03.2014

2 Giraph: In a nutshell

• Giraph is a framework for processing semi-structured graph data on a massive scale

• Giraph is loosely based upon Google's Pregel • Both systems are inspired by the Bulk

Synchronous Parallel model

• Giraph performs iterative calculations on top of an existing Hadoop cluster• Uses Single Map-only Job

• Apache top level project since 2012– http://giraph.apache.org

Page 59: Welcome to Hadoop2Land!

25.03.2014

2 Stop #5: Enterprise Land

Welcome OfficeYARN Land

HDFS 2 Land

YARN App Land Enterprise Land

Page 60: Welcome to Hadoop2Land!

25.03.2014

2 Falcon: In a nutshell

• A framework for managing data processing in Hadoop Clusters

• Falcon runs as a standalone server as part of the Hadoop cluster

• Key Features:

• Data Replication Handling• Data Lifecycle Management• Process Coordination & Scheduling• Declarative Data Process Programming

• Apache Incubation Status• http://falcon.incubator.apache.org

Page 61: Welcome to Hadoop2Land!

25.03.2014

2 Falcon: One-stop Shop

Data Management Needs Tool Orchestration

Data Processing

Replication

Retention

Scheduling

Reprocessing

Multi Cluster Mgmt.

Oozie

Sqoop

Distcp

Flume

MapReduce

Pig & Hive

Page 62: Welcome to Hadoop2Land!

25.03.2014

2 Falcon: Weblog Use Case

• Weblogs saved hourly to primary cluster • HDFS location is /weblogs/{date}

• Desired Data Policy: • Replicate weblogs to secondary cluster • Evict weblogs from primary cluster after 2 days • Evict weblogs from secondary cluster after 1

week

Page 63: Welcome to Hadoop2Land!

25.03.2014

2 Falcon: Weblog Use Case

<feed description=“" name=”feed-weblogs” xmlns="uri:falcon:feed:0.1"><frequency>hours(1)</frequency>

<clusters><cluster name=”cluster-primary" type="source">

<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/><retention limit="days(2)" action="delete"/>

</cluster><cluster name=”cluster-secondary" type="target">

<validity start="2012-01-01T00:00Z" end="2099-12-31T00:00Z"/><retention limit=”days(7)" action="delete"/>

</cluster></clusters>

<locations><location type="data” path="/weblogs/${YEAR}-${MONTH}-${DAY}-

${HOUR}"/></locations>

<ACL owner=”hdfs" group="users" permission="0755"/><schema location="/none" provider="none"/>

</feed>

Page 64: Welcome to Hadoop2Land!

25.03.2014

2 Knox: In a nutshell

• System that provides a single point of authentication and access for Apache Hadoop services in a cluster.

• The gateway runs as a server (or cluster of servers) that provide centralized access to one or more Hadoop clusters.

• The goal is to simplify Hadoop security for both users and operators

• Apache Incubation Status• http://knox.incubator.apache.org

Page 65: Welcome to Hadoop2Land!

25.03.2014

2 Knox: Architecture

ResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

MapReduce 1

map 1.2

map 1.1

MapReduce 2

map 2.1

map 2.2

reduce 2.1

NodeManager NodeManager NodeManager NodeManager

reduce 1.1 Tez map 2.3

reduce 2.2

vertex 1

vertex 2

vertex 3

vertex 4

HOYA

HBase Master

Region server 1

Region server 2

Region server 3 Storm

nimbus 1

nimbus 2

ResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

MapReduce 1

map 1.2

map 1.1

MapReduce 2

map 2.1

map 2.2

reduce 2.1

NodeManager NodeManager NodeManager NodeManager

reduce 1.1 Tez map 2.3

reduce 2.2

vertex 1

vertex 2

vertex 3

vertex 4

HOYA

HBase Master

Region server 1

Region server 2

Region server 3 Storm

nimbus 1

nimbus 2

Kerberos SSO Provider

Identity ProvidersSecure Hadoop Cluster 1

Secure Hadoop Cluster 2

Gateway Cluster

#1 #3#2

Ambari / HueServer

Browser

Ambari Client

RESTClient

JDBCClient

DMZ

Firew

all

Firew

all

Page 66: Welcome to Hadoop2Land!

25.03.2014

2 Final Stop: Goodbye Photo

Welcome OfficeYARN Land

HDFS2 Land

YARN App Land Enterprise Land

Goodbye Photo

Page 67: Welcome to Hadoop2Land!

25.03.2014

2 Hadoop 2: Summary

1. Scale

2. New programming models & services

3. Beyond Java

4. Improved cluster utilization

5. Enterprise Readiness

Page 68: Welcome to Hadoop2Land!

25.03.2014

2 One more thing…

Let’s get started with Hadoop 2!

Page 69: Welcome to Hadoop2Land!

25.03.2014

2 Hortonworks Sandbox 2.0

http://hortonworks.com/products/hortonworks-sandbox

Page 70: Welcome to Hadoop2Land!

25.03.2014

2 Trainings

Developer Training

• 19. - 22.05.2014, Düsseldorf• 30.06 - 03.07.2014, München• 04. - 07.08.2014, Frankfurt

Admin Training

• 26. - 28.05.2014, Düsseldorf• 07. - 09.07.2014, München• 11. - 13.08.2014, Frankfurt

Details: https://www.codecentric.de/schulungen-und-workshops/

Page 71: Welcome to Hadoop2Land!

25.03.2014

2 Thanks for traveling with me